Comments (8)
@eecue1 There's nothing I can do to improve Microsoft's RegEx parser. The
mainline regex is slated to change in the next release so that may provide some
improvement. Also, I'm going to experiment with using a state machine for
parsing in future releases.
If you want to help, the project could use some performance tests. I'm not sure
what you're currently using but if you have any good code that could be adapted
to tests, I'm always open for contributions.
Original comment by [email protected]
on 16 Aug 2012 at 3:50
- Added labels: Priority-Low, Type-Other
- Removed labels: Priority-Medium, Type-Defect
from jquery-csv.
Getting rid of the extraneous construction of intermediate RegExp's (as I
suggest in my fix for Issue #5) and getting rid of reValid entirely (as I
suggest in Issue #7) might possibly make a difference.
Of course, #7 involves another regex test, but it's on the end of the string,
and a much simpler regex.
The Regex in reValid requires 101 steps to match the test data line, which
isn't horrible; there doesn't seem to be any combinatorial backtracking going
on. (I use RegexBuddy to analyze regexes -- I expect you'll find it very useful.
http://www.regexbuddy.com -- well worth the $40 if you deal with complex
regexes a lot.
Also, ANTLR 3 supports Javascript as a target. I've not seen what sort of
Javascript it produces...
Original comment by [email protected]
on 4 Sep 2012 at 10:09
from jquery-csv.
@[email protected] Yeah well what is it, 6 or 8 regex constructions per entry that
gets parsed. We're definitely talking about sloppy O(n) performance on the
regex construction alone. I'm well aware of the issue.
I think, if the line-splitter function (ex csv2Array) were to pass a closure
into the entry-parser function (ex csvEntry2Array) then all you'd need to do is
check the state of the closure and used the enclosed regex objects if they're
available.
Since the regexes can be compiled on the first pass alone, that should change
the regex construction to O(1) complexity.
Of course, that's all theoretical. It should work but I rarely play with
closures so it'll probably take some fiddling before I can get it to work.
Original comment by [email protected]
on 5 Sep 2012 at 7:06
from jquery-csv.
The reValid regex has been disabled. It breaks on the newlines-as-value edge
case and isn't really necessary now that the project has some good test
coverage.
Maybe that will give a slight boost in performance. Next up, I'm going to work
on minimizing the number of regex object constructions.
Original comment by [email protected]
on 9 Sep 2012 at 10:53
from jquery-csv.
OK, the last performance fix is in.
The regex object constructions have been reduced from O(n) to O(1) complexity.
Basically, instead of constructing all new regex objects every time the parser
is called, they are only constructed on the first pass and passed back up the
chain for re-use via a closure.
For example, on a call to $.csv.toArrays() the new arrangement will only
require 3 object constructions no matter how large the input dataset is.
Whereas, the old method adds 3 new constructions for every entry in the CSV
dataset.
In the tests alone (that use minimal datasets) the number of constructions is
reduced from 91 to 21.
Chrome does a lot to optimize away the difference but IE's javascript engine
isn't nearly as optimized so it'll probably the new update will probably have a
greater impact there.
Try it out and let me know if the performance has improved drastically.
Otherwise, I'm going to assume that this is fixed and close it.
Original comment by [email protected]
on 7 Oct 2012 at 2:54
from jquery-csv.
Original comment by [email protected]
on 7 Oct 2012 at 2:54
- Changed state: Fixed
from jquery-csv.
Original comment by [email protected]
on 11 Oct 2012 at 4:07
from jquery-csv.
Original comment by [email protected]
on 15 Oct 2012 at 10:39
- Changed state: Verified
from jquery-csv.
Related Issues (20)
- doesn't seem to recoginse end of line properly in some cases. HOT 1
- CR line ending are not well treated HOT 1
- Typo in basic-usage.html (fixed in 0.8.0) HOT 2
- Enhancement: Parse CSV in a thread using settimeout HOT 3
- 배열의 기본 속성값(remove)이 포함 됩니다. HOT 1
- Karma test fails - HOT 2
- Failures on test.html file HOT 2
- colNum not reset when onParseValue callback called
- Export $ globally in browser when not already set
- CSVDataError: Illegal State - Only in IE HOT 3
- Using Zurb Foundation 3 causing an error HOT 3
- $.csv.toObjects return garbage value HOT 3
- Parse doesn't handle quotes in the CSV, even if escaped HOT 3
- Hooks relying on a return value of false to skip entries makes parsing values that resolve to false impossible
- CSV file created with Excel for Mac fails HOT 3
- Uncaught ReferenceError: require is not defined
- Generate Motion chart HOT 1
- Options arg not really optional? HOT 2
- TypeError: csv.replace is not a function
- Data Format
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jquery-csv.