Code Monkey home page Code Monkey logo

moz-sql-parser's Introduction

NOTICE - June 2021

The Moz SQL Parser repository is now archived. But the good news is that a fork of this project, called More SQL Parsing!, remains under active development independent of Mozilla. That project can be found at https://github.com/klahnakoski/mo-sql-parsing.

Moz SQL Parser

Let's make a SQL parser so we can provide a familiar interface to non-sql datastores!

Branch Status
master Build Status
dev Build Status

Problem Statement

SQL is a familiar language used to access databases. Although, each database vendor has its quirky implementation, the average developer does not know enough SQL to be concerned with those quirks. This familiar core SQL (lowest common denominator, if you will) is useful enough to explore data in primitive ways. It is hoped that, once programmers have reviewed a datastore with basic SQL queries, and they see the value of that data, they will be motivated to use the datastore's native query format.

Objectives

The primary objective of this library is to convert some subset of SQL-92 queries to JSON-izable parse trees. A big enough subset to provide superficial data access via SQL, but not so much as we must deal with the document-relational impedance mismatch.

Non-Objectives

  • No plans to provide update statements, like update or insert
  • No plans to expand the language to all of SQL:2011
  • No plans to provide data access tools

Project Status

There are over 400 tests. This parser is good enough for basic usage, including inner queries.

You can see the parser in action at https://sql.telemetry.mozilla.org/ while using the ActiveData datasource

Install

pip install moz-sql-parser

Parsing SQL

>>> from moz_sql_parser import parse
>>> import json
>>> json.dumps(parse("select count(1) from jobs"))
'{"select": {"value": {"count": 1}}, "from": "jobs"}'

Each SQL query is parsed to an object: Each clause is assigned to an object property of the same name.

>>> json.dumps(parse("select a as hello, b as world from jobs"))
'{"select": [{"value": "a", "name": "hello"}, {"value": "b", "name": "world"}], "from": "jobs"}'

The SELECT clause is an array of objects containing name and value properties.

Recursion Limit

WARNING! There is a recursion limit of 1500. This prevents parsing of complex expressions or deeply nested nested queries. You can increase the recursion limit after you have imported moz_sql_parser, and before you parse:

>>> from moz_sql_parser import parse
>>> sys.setrecursionlimit(3000)
>>> parse(complicated_sql)

Generating SQL

You may also generate SQL from the a given JSON document. This is done by the formatter, which is still incomplete (Jan2020).

>>> from moz_sql_parser import format
>>> format({"from":"test", "select":["a.b", "c"]})
'SELECT a.b, c FROM test'

Contributing

In the event that the parser is not working for you, you can help make this better but simply pasting your sql (or JSON) into a new issue. Extra points if you describe the problem. Even more points if you submit a PR with a test. If you also submit a fix, then you also have my gratitude.

Run Tests

See the tests directory for instructions running tests, or writing new ones.

More about implementation

SQL queries are translated to JSON objects: Each clause is assigned to an object property of the same name.

# SELECT * FROM dual WHERE a>b ORDER BY a+b
{
    "select": "*", 
    "from": "dual", 
    "where": {"gt": ["a", "b"]}, 
    "orderby": {"value": {"add": ["a", "b"]}}
}

Expressions are also objects, but with only one property: The name of the operation, and the value holding (an array of) parameters for that operation.

{op: parameters}

and you can see this pattern in the previous example:

{"gt": ["a","b"]}

Notes

moz-sql-parser's People

Contributors

alberto15romero avatar amolk avatar betodealmeida avatar cknowles-admin avatar diggzhang avatar ealter avatar johnatannvmd avatar klahnakoski avatar mknorps avatar mozilla-github-standards avatar nishikeshkardak avatar pydolan avatar sam-smo avatar sonalisinghal avatar thrbowl avatar todpole3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moz-sql-parser's Issues

Support different kinds of SQL joins in the format

Current format implementation does not support different kinds of SQL joins. There are also skipped the test for this.

Example:

parsed = parse('SELECT t1.field1 FROM t1 LEFT JOIN t2 ON t1.id = t2.id')
# {'select': {'value': 't1.field1'}, 'from': ['t1', {'left join': 't2', 'on': {'eq': ['t1.id', 't2.id']}}]}
format(parsed) # raise key error

Support ORDER BY ASC

I note that this library is intended for a small basic subset of SQL. I'm interested in adding the two pieces of functionality (support Identifier LIKE Pattern and explicit ASC) as it is necessary for my use case. Would such a contribution be welcome?

Subqueries fail to parse

example of query with subquery that fails to parse

SELECT b.a
FROM (
  SELECT 2 AS a
) b

stacktrace

In [148]: moz_sql_parser.parse(sql)
---------------------------------------------------------------------------
ParseException                            Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/moz_sql_parser/__init__.py in parse(sql)
     25     try:
---> 26         parse_result = SQLParser.parseString(sql, parseAll=True)
     27     except Exception as e:

/usr/local/lib/python3.6/site-packages/pyparsing.py in parseString(self, instring, parseAll)
   1631                 # catch and re-raise exception from here, clears out pyparsing internal stack trace
-> 1632                 raise exc
   1633         else:

/usr/local/lib/python3.6/site-packages/pyparsing.py in parseString(self, instring, parseAll)
   1625                 se = Empty() + StringEnd()
-> 1626                 se._parse( instring, loc )
   1627         except ParseBaseException as exc:

/usr/local/lib/python3.6/site-packages/pyparsing.py in _parseCache(self, instring, loc, doActions, callPreParse)
   1528                 try:
-> 1529                     value = self._parseNoCache(instring, loc, doActions, callPreParse)
   1530                 except ParseBaseException as pe:

/usr/local/lib/python3.6/site-packages/pyparsing.py in _parseNoCache(self, instring, loc, doActions, callPreParse)
   1378                 try:
-> 1379                     loc,tokens = self.parseImpl( instring, preloc, doActions )
   1380                 except IndexError:

/usr/local/lib/python3.6/site-packages/pyparsing.py in parseImpl(self, instring, loc, doActions)
   3394             else:
-> 3395                 loc, exprtokens = e._parse( instring, loc, doActions )
   3396             if exprtokens or exprtokens.haskeys():

/usr/local/lib/python3.6/site-packages/pyparsing.py in _parseCache(self, instring, loc, doActions, callPreParse)
   1528                 try:
-> 1529                     value = self._parseNoCache(instring, loc, doActions, callPreParse)
   1530                 except ParseBaseException as pe:

/usr/local/lib/python3.6/site-packages/pyparsing.py in _parseNoCache(self, instring, loc, doActions, callPreParse)
   1382             else:
-> 1383                 loc,tokens = self.parseImpl( instring, preloc, doActions )
   1384

/usr/local/lib/python3.6/site-packages/pyparsing.py in parseImpl(self, instring, loc, doActions)
   3182         if loc < len(instring):
-> 3183             raise ParseException(instring, loc, self.errmsg, self)
   3184         elif loc == len(instring):

ParseException: Expected end of text (at char 13), (line:3, col:1)

During handling of the above exception, another exception occurred:

ParseException                            Traceback (most recent call last)
<ipython-input-148-4540f730d868> in <module>()
----> 1 moz_sql_parser.parse(sql)

/usr/local/lib/python3.6/site-packages/moz_sql_parser/__init__.py in parse(sql)
     33                 if not f.startswith("{")
     34             ]
---> 35             raise ParseException(sql, e.loc, "Expecting one of (" + (", ".join(expecting)) + ")")
     36         raise
     37     return _scrub(parse_result)

ParseException: Expecting one of (float, select, and, left outer join, then, in, end, group by, lt, as, case, full join, left join, else, offset, full outer join, or, when, concat, union all, div, Found unwanted token, {and | as | asc | between | case | collate nocase | cross join | desc | else | end | from | full join | full outer join | group by | having | in | inner join | is | join | left join | left outer join | limit | offset | like | on | or | order by | right join | right outer join | select | then | union | union all | when | where | with}, like, string, limit, "-", from, collate nocase, desc, asc, "(", cross join, gt, order by, add, is, right outer join, join, right join, union, func_param, having, eq, "not", null, with, between, mul, int, where, inner join, lte, on, "distinct", neq, gte, sub) (at char 13), (line:3, col:1)

Formatter does not pass new test suite

The new TestFormatAndParse suite has 49 failed tests, and a few skipped tests that may also fail.

For example, test_format_and_parse.TestFormatAndParse.test169() fails because moz_sql_parser.formatting.orderby() is expecting a dict, but should handle lists of dicts too.

We can skip the broken tests for now, but this issue will stay open until this suite passes.

Usage of "NOT" before operators fails (e.g., "NOT LIKE", "NOT IN", etc)

A query such as SELECT col1 FROM table1 WHERE col0 NOT IN (1, 2) will fail with a ParseException("expecting one of ..."), however, SELECT col1 FROM table1 WHERE NOT col0 IN (1, 2) will parse okay.

A solution to #48 might address this, but as an alternative, I'm thinking we can simply create a few new keywords to address this ("not in", "not between", "not like", etc.). Thoughts?

Lists of literals are collapsed into one literal block, unless there is one non-literal in the list.

Literals lists do not seem to be consequently collapsed.

a IN ('abc','def') parses to {'in': ['a', {'literal': ['abc', 'def']}]} while one would expect {'in': ['a', [{'literal': 'abc'}, {'literal': 'def'}]]}
while
a IN ('abc',3,'def') parses to {'in': ['a', [{'literal': 'abc'}, 3, {'literal': 'def'}]]}

appending more string literals to the end will keep them separated. This seems like a non-intentional feature. The same happens with function arguments. When using the parser to create tree, this requires an extra preprocessing step, to de-collapse and expand the literal objects.

Python3 support

mo_testing also needs to be fixed for python3, otherwise it looks like the changes are trivial.

Add query validator

It would be a good enhancement to add a query validator. Sometimes user fires a query with some mismatch in spelling #70, sometimes single quotes insteed of double quotes #63, lots of possibilities are present. So we can make a Query validator, which first check the query. If the query is wrong or something mistake in spelling then it will throw an InvalidQueryError and points to recheck the query with explanation.

Recursion Depth

I encountered a tricky issue when parsing SQL queries with nested parenthesis in logic expressions.

For example, when I use the default code to parse the query

SELECT * FROM A WHERE false and ((not ((true) or (false))))

I encounter the error RecursionError: maximum recursion depth exceeded while calling a Python object.

The issue can be resolved by lifting the system recursion limit to 10000, with which the parser outputs the correct parse

    "select": "*",
    "from": "A",
    "where": {
        "and": [
            "false",
            {
                "not": {
                    "or": [
                        "true",
                        "false"
                    ]
                }
            }
        ]
    }
}

I'm not completely familiar with how pyparsing works. Could someone explain in this case why the recursion depth has exceeded 1500 or could this be possibly caused by a bug?

improve parse error reporting

The parse problem in #63 could be seen in the error coming out of the parser, but it not clear.

Having the error include the (partial) line, and a pointer (^) to the exact character, would help.

In general, this is a hard problem: The parser is recursive decent, and has forgotten other failed paths that may point to the "true" error. Distinguishing between legitimate pattern mismatches and parse errors may be impossible without more information. Do we just assume the error is at the end of the longest matched substring? Do we report all parse-tree branches that end in failure? I guess we should wait and record some misleading parse errors before we get fancy,

SQL Parse error

"SELECT oc.dh_registro, oc.id_cliente, cli.nome_reduzido, cli.id_cliente, oc.id_proc_padrao, oc.texto_ocorrencia, count(1) as 'quantidade', opc.contagem_min, opc.contagem_max FROM ocorrencias oc LEFT JOIN ocorrencia_padrao_cliente opc ON oc.id_proc_padrao = opc.id_proc_padrao INNER JOIN cliente cli ON oc.id_cliente = cli.id_cliente WHERE oc.dh_registro BETWEEN '2019-01-07 00:00:00' AND NOW() AND cli.id_cliente='5781' AND oc.oco_cancelada=0 GROUP BY oc.id_proc_padrao ORDER BY oc.id_proc_padrao"

This SQL is not parsing, any suggestions?

Can't run just 'select_one_column' unit test

The unit tests all pass when you run them all, but weird things happen if I try to run just the 'select_one_column' test.

$ python -m unittest tests.test_simple.TestSimple -f
...................s............
----------------------------------------------------------------------
Ran 32 tests in 2.076s

OK (skipped=1)
$ python -m unittest tests.test_simple.TestSimple.select_one_column -f
2018-09-12 13:13:44 - WARNING: Can not expand expected|json|limit(10000) in template: "ERROR: {{test|json|limit(10000)}} does not match expected {{expected|json|limit(10000)}}\n"
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 657, in replacer
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 661, in _simple_expand
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 534, in expand_template
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/exceptions.py", line 108, in __unicode__
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/exceptions.py", line 125, in __str__
        File "/yourusername/anaconda3/lib/python3.6/traceback.py", line 151, in _some_str
        File "/yourusername/anaconda3/lib/python3.6/traceback.py", line 501, in __init__
        File "/yourusername/anaconda3/lib/python3.6/traceback.py", line 486, in __init__
        File "/yourusername/anaconda3/lib/python3.6/traceback.py", line 486, in __init__
        File "/yourusername/anaconda3/lib/python3.6/unittest/result.py", line 186, in _exc_info_to_string
        File "/yourusername/anaconda3/lib/python3.6/unittest/result.py", line 115, in addError
        File "/yourusername/anaconda3/lib/python3.6/unittest/result.py", line 17, in inner
        File "/yourusername/anaconda3/lib/python3.6/unittest/runner.py", line 67, in addError
        File "/yourusername/anaconda3/lib/python3.6/unittest/case.py", line 543, in _feedErrorsToResult
        File "/yourusername/anaconda3/lib/python3.6/unittest/case.py", line 613, in run
        File "/yourusername/anaconda3/lib/python3.6/unittest/case.py", line 653, in __call__
        File "/yourusername/anaconda3/lib/python3.6/unittest/suite.py", line 122, in run
        File "/yourusername/anaconda3/lib/python3.6/unittest/suite.py", line 84, in __call__
        File "/yourusername/anaconda3/lib/python3.6/unittest/suite.py", line 122, in run
        File "/yourusername/anaconda3/lib/python3.6/unittest/suite.py", line 84, in __call__
        File "/yourusername/anaconda3/lib/python3.6/unittest/runner.py", line 176, in run
        File "/yourusername/anaconda3/lib/python3.6/unittest/main.py", line 256, in runTests
        File "/yourusername/anaconda3/lib/python3.6/unittest/main.py", line 95, in __init__
        File "/yourusername/anaconda3/lib/python3.6/unittest/__main__.py", line 18, in <module>
        File "/yourusername/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
        File "/yourusername/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
caused by
        ERROR: Object of type 'Data' is not JSON serializable
        File "/yourusername/anaconda3/lib/python3.6/json/encoder.py", line 180, in default
        File "/yourusername/anaconda3/lib/python3.6/json/encoder.py", line 257, in iterencode
        File "/yourusername/anaconda3/lib/python3.6/json/encoder.py", line 199, in encode
        File "/yourusername/anaconda3/lib/python3.6/json/__init__.py", line 231, in dumps
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 48, in <lambda>
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 184, in json
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 638, in replacer
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 661, in _simple_expand
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/strings.py", line 534, in expand_template
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/exceptions.py", line 108, in __unicode__
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/exceptions.py", line 125, in __str__
        File "/yourusername/anaconda3/lib/python3.6/traceback.py", line 151, in _some_str
.
.
<snip>
.
.
        File "/yourusername/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main

E
======================================================================
ERROR: select_one_column (tests.test_simple.TestSimple)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/yourusername/sandbox/moz-sql-parser/tests/test_simple.py", line 46, in select_one_column
    self.assertEqual(result, expected)
  File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 57, in assertEqual
    self.assertAlmostEqual(test_value, expected, msg=msg, digits=digits, places=places, delta=delta)
  File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 54, in assertAlmostEqual
    assertAlmostEqual(test_value, expected, msg=msg, digits=digits, places=coalesce(places, self.default_places), delta=delta)
  File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 140, in assertAlmostEqual
    cause=e
  File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_logs/__init__.py", line 415, in error
    raise_from_none(e)
  File "<string>", line 2, in raise_from_none
mo_logs.exceptions.Except: ERROR: [template expansion error: (Object of type 'Data' is not JSON serializable)] does not match expected [template expansion error: (Object of type 'Data' is not JSON serializable)]
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 140, in assertAlmostEqual
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 54, in assertAlmostEqual
        File "/yourusername/sandbox/moz-sql-parser/venv/lib/python3.6/site-packages/mo_testing/fuzzytestcase.py", line 57, in assertEqual
        File "/yourusername/sandbox/moz-sql-parser/tests/test_simple.py", line 46, in select_one_column
        File "/yourusername/anaconda3/lib/python3.6/unittest/case.py", line 605, in run
.
.
<snip>
.
.
----------------------------------------------------------------------
Ran 1 test in 0.039s

FAILED (errors=1)

The examples given in the README.md are different than actual

Describe the bug

I have tried examples given in the README.md and I noticed that the given output is different than actual.

Screenshots

Usage 1:

>>> from moz_sql_parser import parse
>>> import json
>>> json.dumps(parse("select count(1) from jobs"))
'{"from": "jobs", "select": {"value": {"count": {"literal": 1}}}}'

Actual :

Capture1

Usage 2

>>> json.dumps(parse("select a as hello, b as world from jobs"))
'{"from": "jobs", "select": [{"name": "hello", "value": "a"}, {"name": "world", "value": "b"}]}'

Actual :
Capture2

Usage 3

# SELECT * FROM dual WHERE a>b ORDER BY a+b
{
    "select": "*",
    "from": "dual"
    "where": {"gt": ["a","b"]},
    "orderby": {"add": ["a", "b"]}
}

Actual:
Capture3

Aliases on joined tables not included in parsed result

Example query: parse("SELECT * FROM table1 t1 JOIN table3 t3 ON t1.id = t3.id")

Currently results in:

{
    'from': [
        {'name': 't1', 'value': 'table1'}, 
        {'on': {'eq': ['t1.id', 't3.id']}, 'join': 'table3'}
    ], 
    'select': {'value': '*'}
}

However, the mapping of "table3" to its alias of "t3" is lost. Perhaps the from entry for table3 should be something like, {'on': {'eq': ['t1.id', 't3.id']}, 'join': {'name': 't3', 'value': 'table3'}}

Improving parsing speed

I recently switched a project from using sqlparse to the moz-sql-parser, and overall, I like the moz-sql-parser much more (easy-to-use output, etc). However, I'm finding that sqlparse is at least 10x slower for my tests, so I am interested in how we can speed up this library.

I did try switching the recursion limit from 2000 to 1000, but my sample queries are pretty basic, so that probably explains why the runtime remained the same.

I haven't looked at pyparsing much, but since it appears to be entirely in python, there's probably room to optimize it.

Do you have any other ideas on where things can be optimized?

Issue with queries having negative integers

If query given is : select B,C from table1 where A=-900 or B=100
Then current parsed query is
{u'from': u'table1', u'where': {u'eq': [u'A', {u'neg': {u'or': [900, {u'eq': [u'B', 100]}]}}]}, u'select': [{u'value
': u'B'}, {u'value': u'C'}]}

But I think, It would be much easier if solution would be
{u'from': u'table1', u'where': {u'or': [{u'eq': [u'A', -900]}, {u'eq': [u'B', 100]}]}, u'select': [{u'value': u'B'}, {u'value': u'C'}]}

UNION with SELECT ... ORDER BY fails to parse

Hello,

First of all, thank you for nice parser, it helps much of my usecases.
Here I report some glitch with parsing UNION-ed SELECTs containing ORDER BY.
Confirmed in moz-sql-parser=2.42.19034, pyparsing=2.2.0 on Python3.6.5.

Thanks

>>> mozql.parse('select a from b order by a asc')
{'select': {'value': 'a'}, 'from': 'b', 'orderby': {'value': 'a', 'sort': 'asc'}}
>>> mozql.parse('select a from b order by a desc')
{'select': {'value': 'a'}, 'from': 'b', 'orderby': {'value': 'a', 'sort': 'desc'}}
>>> mozql.parse('select a from b union select 2')
{'union': [{'select': {'value': 'a'}, 'from': 'b'}, {'select': {'value': 2}}]}
>>> mozql.parse('select a from b oder by a union select 2')
Traceback (most recent call last):
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/moz_sql_parser/__init__.py", line 35, in parse
    parse_result = SQLParser.parseString(sql, parseAll=True)
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1632, in parseString
    raise exc
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1626, in parseString
    se._parse( instring, loc )
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1529, in _parseCache
    value = self._parseNoCache(instring, loc, doActions, callPreParse)
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1379, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 3395, in parseImpl
    loc, exprtokens = e._parse( instring, loc, doActions )
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1529, in _parseCache
    value = self._parseNoCache(instring, loc, doActions, callPreParse)
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 1383, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/pyparsing.py", line 3183, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected end of text (at char 21), (line:1, col:22)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/developer/.virtualenvs/3/lib/python3.6/site-packages/moz_sql_parser/__init__.py", line 44, in parse
    raise ParseException(sql, e.loc, "Expecting one of (" + (", ".join(expecting)) + ")")
pyparsing.ParseException: Expecting one of (join, offset, union all, right outer join, full outer join, left outer join, right join, order by, left join, limit, full join, having, group by, inner join, where) (at char 21), (line:1, col:22)

bug in parsing ?

It appears that the parser has trouble parsing correctly queries containing "IS NOT NULL".
It translates this to "exists" and then breaks or something?

e.g. this is the query:

SELECT MIN(cn.name) AS from_company
FROM company_name AS cn, company_type AS ct, keyword AS k, movie_link AS ml, title AS t
WHERE cn.country_code !='[pl]' AND ct.kind IS NOT NULL AND t.production_year > 1950 AND ml.movie_id = t.id

and this is its parsed version:

{ 'from': [ {'name': 'cn', 'value': 'company_name'}, {'name': 'ct', 'value': 'company_type'}, {'name': 'k', 'value': 'keyword'}, {'name': 'ml', 'value': 'movie_link'}, {'name': 't', 'value': 'title'}], 'select': {'name': 'from_company', 'value': {'min': 'cn.name'}}, 'where': { 'and': [ {'neq': ['cn.country_code', {'literal': '[pl]'}]}, {'exists': 'ct.kind'}]}}

Reduce recursion limit on master

The recursion limit has been increased to pass tests, but it is too high for production. Add a test that will fail if the recursion limit >1500 and we are on master branch

cannot parse the LIKE syntax

Hi

I'm writing a sql query which contains LIKE syntax and it's giving an error.

$ json.dumps(parse("select empid from emp where ename like 's%' "))

Traceback (most recent call last):
File "<pyshell#7>", line 1, in
json.dumps(parse("select empid from emp where ename like 's%' "))
File "C:\Python27\lib\site-packages\moz_sql_parser_init_.py", line 33, in parse
raise ParseException(sql, e.loc, "Expecting one of (" + (", ".join(expecting)) + ")")
ParseException: Expecting one of (and, gt, is, in, eq, sub, union, add, group by, lt, mult, between, order by, neq, func_param, concat, gte, having, limit, lte, collate nocase, div, or) (at char 34), (line:1, col:35)

Please resolve.

No module named 'mo_future'

Getting this error in Jupyter, with Anaconda python 3.6.3.

Installed in Anaconda via pip-
Successfully installed moz-sql-parser-1.3.18033

from moz_sql_parser import parse

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-18-9c8bbc990e39> in <module>()
----> 1 from moz_sql_parser import parse

~\Anaconda3\lib\site-packages\moz_sql_parser\__init__.py in <module>()
     14 import json
     15 
---> 16 from mo_future import text_type, number_types, binary_type
     17 from pyparsing import ParseException
     18 

ModuleNotFoundError: No module named 'mo_future'

Also tried in centos7 under py 3.6.3, same error.

Successfully installed moz-sql-parser-1.3.18033 pyparsing-2.2.0

> python3 test.py

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    from moz_sql_parser import parse
  File "/home/vagrant/python/sqlparser/ve/lib/python3.6/site-packages/moz_sql_parser/__init__.py", line 16, in <module>
    from mo_future import text_type, number_types, binary_type
ModuleNotFoundError: No module named 'mo_future'

Synthesize SQL from parsed results?

Thanks for sharing a useful project.

Is it possible to synthesize SQL string from the parsed results? I am trying to use this to modify queries, but I can't find how to synthesize SQL from the parsed dict. This project looks much simpler than the other one, and I'd go for this if it supports synthesizing.

Thank you!

Incorrect order of operations

Rules for order of operations specify that addition and subtraction have the same priority and should be applied from left to right. Similarly, multiplication and division have the same priority and should be applied from left to right. But the implementation in moz_sql_parser treats addition as higher priority than subtraction, and it treats multiplication as higher priority than division. For example, the output from moz_sql_parser is as follows:

>>> from moz_sql_parser import parse
>>> import json
>>> json.dumps(parse("select 5-4+2"))
'{"select": {"value": {"sub": [5, {"add": [4, 2]}]}}}'
>>> json.dumps(parse("select 5/4*2"))
'{"select": {"value": {"div": [5, {"mul": [4, 2]}]}}}'

To follow the rules for order of operations, the output should be as follows:

>>> json.dumps(parse("select 5-4+2"))
'{"select": {"value": {"add": [{"sub": [5, 4]}, 2]}}}'
>>> json.dumps(parse("select 5/4*2"))
'{"select": {"value": {"mul": [{"div": [5, 4]}, 2]}}}'

Re-add setup.py to repo so that installs can be done from git URL

It looks like you removed setup.py from the repo and instead have some external tool generate it when doing releases. Unfortunately, by not having a setup.py file in the repo, a pip install from a git URL fails with an error about setup.py being missing.

I ran into this issue when trying to install my fork of the project via: pip install git+https://github.com/pydolan/moz-sql-parser.git@new-join-types. My workaround is to just copy the library into my project until my recent change gets merged, but having the setup.py would help me avoid this minor work (and ease pressure on your end to push releases out often).

Thoughts?

Memory leak related to accumulation of ParseExceptions in `all_exceptions`

I have been tracking down memory issues in my application and stumbled upon a possible issue with the DEBUG option for the parser.

Brief background on my application, it is an evolutionary algorithm that is essentially one big loop. There should be almost no accumulation of data between each iteration of the loop and GC should be able to clean up everything that the previous iteration was manipulating. Below are some counts for different objections I see each iteration of the loop. Clearly some types of objects are sticking around between iterations.

object_growth

(Horizontal axis is loop iteration)

Definitely a lot of pyparsing ParseExceptions, dicts, and tuples being kept around. I used the objgraph package to check what modules were referencing these objects. I checked 10 random instances per loop iteration, and they all pointed to the moz_sql_parser module via lists within the all_exceptions dictionary.

Some example reference graphs:

ParseException

ParseException_reference_graph

Dicts

dict_reference_graph

Tuple

tuple_reference_graph

I then started reading the source code surrounding all_exceptions. I have never used PyParsing directly so I might be wrong, but it looks as though the logic for DEBUG is inverted. As far as I can tell, if DEBUG=False (hardcoded default) then the record_exception function will trigger. If DEBUG=True then record_exception will not be used.

https://github.com/mozilla/moz-sql-parser/blob/dev/moz_sql_parser/sql_parser.py#L39-L42

DEBUG = False

# ...

if DEBUG:
    debug = (None, None, None)
else:
    debug = (nothing, nothing, record_exception)

Shouldn't this be inverted? I only expect to record exceptions if debug is true. Maybe I am missing some details of pyparsing. This logic is the same for master and dev.

remove six requirement

The format tests have this line from six import string_types, text_type. Replace it with mo-future, and remove six from the requirements files

Queries with typed literals fail to parse

In at least MySQL (e.g., with dates), literals can be assigned a type, such as in this query:

SELECT * FROM t1 WHERE col1 > DATE '2011-01-01' - INTERVAL 5 DAYS

This results in a ParseException ("Expecting one of (and, gt, is, union all...").

I might have time to help with a PR. If you have any thoughts on how best to implement this, let me know.

Change "mult" to "mul"

@klahnakoski

I think to keep the same key with Python built-in method __mul__, it can simplify the data structure to map the python class.

Does it make sense, or any comments?

Add deploy test: After pip install, all tests run

As per #11, there is risk that pypi-deployed artifact does not pass tests. Specifically, this was due to missing requires in setup.py, but generally other mistakes can happen.

After the deploy script runs, it should be able to pip-install to a virtualenv and pass the tests. This is a little difficult because there is a delay between the time a module is uploaded to pypi and the time it can be installed.

More helpful error messages

From klahnakoski/mo-logs#2 :

Traceback (most recent call last):
  File "/Users/andersh/Projects/moz-sql-parser/tests/test_simple.py", line 113, in test_bad_select1
    self.assertRaises('Expected select', lambda: parse("se1ect A, B, C from dual"))
  File "/Users/andersh/Projects/moz-sql-parser/venv/lib/python2.7/site-packages/mo_testing/fuzzytestcase.py", line 69, in assertRaises
    cause=f
  File "/Users/andersh/Projects/moz-sql-parser/venv/lib/python2.7/site-packages/mo_logs/__init__.py", line 398, in error
    raise e
Except: ERROR: expecting an exception returning "Expected select" got something else instead

That's a deeply unhelpful message. You should print what that "something else" is.

moz-sql-parse recursion crashes Python

select deflate(ceo.name, 'ceo_name', mobile_price.type, sum(int(mobile_price.price)), '0:普通,1:旗舰', 0) from mobile, company, ceo, mobile_price where mobile.id = mobile_price.mobile_id and mobile.company_id = company.id and company.id = ceo.company_id group by ceo.name, mobile_price.type

Wiki changes

FYI: The following changes were made to this repository's wiki:

These were made as the result of a recent automated defacement of publically writeable wikis.

"tests" should be located under the package

In its current state, PIP will install files in the following locations:

    ..../site-packages/moz_sql_parser-2.18.18240-py3.6.egg-info
    ..../site-packages/moz_sql_parser/*
    ..../site-packages/tests/*

I think the unit tests should be nested under the project.

I discovered this when I ran 'pip uninstall moz_sql_parser' and it showed me the file that were set to be removed.

Ugly multilines

This multiline string messes with Python's nice indenting

def __str__(self):
res = """
SQL: %s
Broken SQL: %s
JSON:
%s
Broken JSON:
%s
""" % (self.expected_sql, self.new_sql, pformat(self.expected_json), pformat(self.new_json))
return res
class TestFormatAndParse(FuzzyTestCase):

Please assign multiline strings to module constants, and use those constants in the code

CODE_OF_CONDUCT.md file missing

As of January 1 2019, Mozilla requires that all GitHub projects include this CODE_OF_CONDUCT.md file in the project root. The file has two parts:

  1. Required Text - All text under the headings Community Participation Guidelines and How to Report, are required, and should not be altered.
  2. Optional Text - The Project Specific Etiquette heading provides a space to speak more specifically about ways people can work effectively and inclusively together. Some examples of those can be found on the Firefox Debugger project, and Common Voice. (The optional part is commented out in the raw template file, and will not be visible until you modify and uncomment that part.)

If you have any questions about this file, or Code of Conduct policies and procedures, please see Mozilla-GitHub-Standards or email [email protected].

(Message COC001)

not support like query

when sql is 'like' query,it throw pyparsing.ParseException: Expecting one of .
i find the source code not contains 'like' function

Support for table-valued functions?

Our project needs to be able to parse "table-valued functions" as from statement, similar to what can be found in postgresql and sqlserver:

sqlserver:
select * from some_table.some_function('parameter', 1, some_col)

postgres:
select * from some_func('parameter', 1, some_col)

I've made a quick test and have this (kindof) working with the sqlserver syntax. I did run into a some problem when using a dot as separator for the function name (probably the matcher eagerly parsing the table name). So for now we use the following format:

select * from some_table::myfunc(...)

It would be great not having to maintain this fork. Can we submit a pull request (after fixing parsing issue "::") or is it outside of the scope of this project?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.