Code Monkey home page Code Monkey logo

Comments (8)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024
@eecue1 There's nothing I can do to improve Microsoft's RegEx parser. The 
mainline regex is slated to change in the next release so that may provide some 
improvement. Also, I'm going to experiment with using a state machine for 
parsing in future releases.

If you want to help, the project could use some performance tests. I'm not sure 
what you're currently using but if you have any good code that could be adapted 
to tests, I'm always open for contributions.

Original comment by [email protected] on 16 Aug 2012 at 3:50

  • Added labels: Priority-Low, Type-Other
  • Removed labels: Priority-Medium, Type-Defect

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024
Getting rid of the extraneous  construction of intermediate RegExp's (as I 
suggest in my fix for Issue #5) and getting rid of reValid entirely (as I 
suggest in Issue #7) might possibly make a difference.

Of course, #7 involves another regex test, but it's on the end of the string, 
and a much simpler regex.

The Regex in reValid requires 101 steps to match the test data line, which 
isn't horrible; there doesn't seem to be any combinatorial backtracking going 
on. (I use RegexBuddy to analyze regexes -- I expect you'll find it very useful.

http://www.regexbuddy.com -- well worth the $40 if you deal with complex 
regexes a lot.

Also, ANTLR 3 supports Javascript as a target. I've not seen what sort of 
Javascript it produces...

Original comment by [email protected] on 4 Sep 2012 at 10:09

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024
@[email protected] Yeah well what is it, 6 or 8 regex constructions per entry that 
gets parsed. We're definitely talking about sloppy O(n) performance on the 
regex construction alone. I'm well aware of the issue.

I think, if the line-splitter function (ex csv2Array) were to pass a closure 
into the entry-parser function (ex csvEntry2Array) then all you'd need to do is 
check the state of the closure and used the enclosed regex objects if they're 
available.

Since the regexes can be compiled on the first pass alone, that should change 
the regex construction to O(1) complexity.

Of course, that's all theoretical. It should work but I rarely play with 
closures so it'll probably take some fiddling before I can get it to work.

Original comment by [email protected] on 5 Sep 2012 at 7:06

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024
The reValid regex has been disabled. It breaks on the newlines-as-value edge 
case and isn't really necessary now that the project has some good test 
coverage.

Maybe that will give a slight boost in performance. Next up, I'm going to work 
on minimizing the number of regex object constructions.

Original comment by [email protected] on 9 Sep 2012 at 10:53

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024
OK, the last performance fix is in.

The regex object constructions have been reduced from O(n) to O(1) complexity. 
Basically, instead of constructing all new regex objects every time the parser 
is called, they are only constructed on the first pass and passed back up the 
chain for re-use via a closure.

For example, on a call to $.csv.toArrays() the new arrangement will only 
require 3 object constructions no matter how large the input dataset is. 
Whereas, the old method adds 3 new constructions for every entry in the CSV 
dataset.

In the tests alone (that use minimal datasets) the number of constructions is 
reduced from 91 to 21.

Chrome does a lot to optimize away the difference but IE's javascript engine 
isn't nearly as optimized so it'll probably the new update will probably have a 
greater impact there.

Try it out and let me know if the performance has improved drastically. 
Otherwise, I'm going to assume that this is fixed and close it.

Original comment by [email protected] on 7 Oct 2012 at 2:54

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024

Original comment by [email protected] on 7 Oct 2012 at 2:54

  • Changed state: Fixed

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024

Original comment by [email protected] on 11 Oct 2012 at 4:07

from jquery-csv.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 21, 2024

Original comment by [email protected] on 15 Oct 2012 at 10:39

  • Changed state: Verified

from jquery-csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.