I recently switched a project from using <a href="https://github.com/andialbrecht/sqlp

Please see more detail in the <a href="https://github.com/mozilla/moz-sql-parser/blob/

fixed with <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Improving parsing speed about moz-sql-parser HOT 4 CLOSED

mozilla commented on August 21, 2024 1

Improving parsing speed

from moz-sql-parser.

Comments (4)

klahnakoski commented on August 21, 2024

I am interested in making this parser faster too. I have looked at the internal pyparsing code; it could benefit from cleaning up the basic data structures it uses: Using less attributes, not copying whole objects, using slots_, etc.

I have a GSOC project proposal for this summer that deals with the high-level algorithm slowness: https://github.com/mozilla/moz-sql-parser/blob/dev/docs/Student%20Project%202019.md If this proposal can be rewritten to include the speedup you want, or if you write a whole other project plan, I am willing to submit it to GSOC, and mentor it over the summer, if a suitable student applies.

You may consider running moz-sql-parser with PyPy for cheap speed increases on bigger parsing loads.

If you have some experience with parser generators, and some wise words about how to make them faster, please add your ideas here.

Finally, the pyparsing module owner may be open to refactoring the code for a version 3 (pyparsing/pyparsing#66 (comment)). So we may get him involved in whatever the project may be.

from moz-sql-parser.

klahnakoski commented on August 21, 2024

Please see more detail in the student project

from moz-sql-parser.

ptmcg commented on August 21, 2024

Early in pyparsing's life I did some performance optimizations - I think one of the most notable (not counting packrat parsing) was converting Literal._parseImpl matching to using startswith instead of string slicing. It is entirely possible that these optimizations may have been peculiar to the Python 2.4 or 2.5 code of its day, and that current Python versions may work better with string slicing (and thereby avoiding the startswith function call).

Coverage and profiling could probably benefit from using a complex parser and input corpus. For my earlier optimization work I used the Verilog parser that is included in the examples, with Verilog scripts I found through googling. Unfortunately I do not include these scripts in the pyparsing distribution, as they are published under a variety of licenses and could muck up pyparsing's MIT license.

In general, pyparsing has 3 phases:

parser creation
parser run-time
post-parse results processing

In order of importance for optimization, I would rank 2 highest, and can see it as a tie between 1 and 3. For very large parsers, I have seen people gain from pickling their parser using dill and then unpickling prior to parsing, cutting the phase 1 time in about half. But otherwise, I would not put a lot of time in optimizing phase 1 code. As for phase 3 (consisting mostly of navigating the constructed ParseResults structures), I've not gotten many complaints, perhaps just because phase 2 is so slow, and because any performance issues here are largely dependent on how the structures are created, how they are navigated, and what processing is being done with the results.

Oh, I also forgot phase 0, which would be importing of pyparsing to begin with. I was very careful in the recent addition of pyparsing_unicode to do as much lazy initialization as possible, so that users who were not parsing multi-lingual text would not have to construct large character lists that they would not need. I would also be interested to know if there are more things that could be done to defer some of the JIT'ing in PyPy, which already has a large startup cost and if I can avoid adding to that I think that would be a good thing.

from moz-sql-parser.

klahnakoski commented on August 21, 2024

fixed with #147

from moz-sql-parser.

Improving parsing speed about moz-sql-parser HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent