Code Monkey home page Code Monkey logo

Comments (4)

klahnakoski avatar klahnakoski commented on August 21, 2024

I am interested in making this parser faster too. I have looked at the internal pyparsing code; it could benefit from cleaning up the basic data structures it uses: Using less attributes, not copying whole objects, using slots_, etc.

I have a GSOC project proposal for this summer that deals with the high-level algorithm slowness: https://github.com/mozilla/moz-sql-parser/blob/dev/docs/Student%20Project%202019.md If this proposal can be rewritten to include the speedup you want, or if you write a whole other project plan, I am willing to submit it to GSOC, and mentor it over the summer, if a suitable student applies.

You may consider running moz-sql-parser with PyPy for cheap speed increases on bigger parsing loads.

If you have some experience with parser generators, and some wise words about how to make them faster, please add your ideas here.

Finally, the pyparsing module owner may be open to refactoring the code for a version 3 (pyparsing/pyparsing#66 (comment)). So we may get him involved in whatever the project may be.

from moz-sql-parser.

klahnakoski avatar klahnakoski commented on August 21, 2024

Please see more detail in the student project

from moz-sql-parser.

ptmcg avatar ptmcg commented on August 21, 2024

Early in pyparsing's life I did some performance optimizations - I think one of the most notable (not counting packrat parsing) was converting Literal._parseImpl matching to using startswith instead of string slicing. It is entirely possible that these optimizations may have been peculiar to the Python 2.4 or 2.5 code of its day, and that current Python versions may work better with string slicing (and thereby avoiding the startswith function call).

Coverage and profiling could probably benefit from using a complex parser and input corpus. For my earlier optimization work I used the Verilog parser that is included in the examples, with Verilog scripts I found through googling. Unfortunately I do not include these scripts in the pyparsing distribution, as they are published under a variety of licenses and could muck up pyparsing's MIT license.

In general, pyparsing has 3 phases:

  1. parser creation
  2. parser run-time
  3. post-parse results processing

In order of importance for optimization, I would rank 2 highest, and can see it as a tie between 1 and 3. For very large parsers, I have seen people gain from pickling their parser using dill and then unpickling prior to parsing, cutting the phase 1 time in about half. But otherwise, I would not put a lot of time in optimizing phase 1 code. As for phase 3 (consisting mostly of navigating the constructed ParseResults structures), I've not gotten many complaints, perhaps just because phase 2 is so slow, and because any performance issues here are largely dependent on how the structures are created, how they are navigated, and what processing is being done with the results.

Oh, I also forgot phase 0, which would be importing of pyparsing to begin with. I was very careful in the recent addition of pyparsing_unicode to do as much lazy initialization as possible, so that users who were not parsing multi-lingual text would not have to construct large character lists that they would not need. I would also be interested to know if there are more things that could be done to defer some of the JIT'ing in PyPy, which already has a large startup cost and if I can avoid adding to that I think that would be a good thing.

from moz-sql-parser.

klahnakoski avatar klahnakoski commented on August 21, 2024

fixed with #147

from moz-sql-parser.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.