Comments (14)
@kou
Yes, I would like to.
I used a CSV file made by a tool for spreadsheet and IDE to reproduce this problem.
So, I will make script that produces a CSV.
from csv.
@kou
I could reproduce this bug at hand.
from csv.
Thanks for your report.
It may be a bug of internal chunk based stream parser:
Lines 299 to 308 in 22e62bc
@abcdefg-1234567 Great! Do you want to work on fixing this problem?
Could you share a script that produces a CSV that reproduces this problem as the first step?
from csv.
@kou
I came up with the following code that creates a csv.
CSV.open('test.csv', 'w') do |csv|
2500.times do
csv << ['AAAA1234567890']
end
end
However, I do not know how to make the following state by code.
- file with CRLF endings
- file with no EOL/trailling newline
I have confirmed that the bug is reproduced when the following conditions are set manually using the IDE.
Could you please give me any ideas?
from csv.
hey @abcdefg-1234567 you can use the following:
File.open('test.csv', 'w') do |f|
2499.times do
f.print("AAAA1234567890\r\n")
end
f.print("AAAA1234567890")
end
from csv.
@GabrielNagy
Thank you!
from csv.
OK. Let's reduce the reproducible CSV size as much as possible as the next step for easy to debug.
If it's difficult, we can start debugging with the reproducible CSV.
#279 (comment) may help you.
from csv.
I have confirmed that the bug will not reproduce if the csv is less than 2048 rows.
from csv.
I have confirmed the following.
The result of "value = parse_column_value (line 1030 of parser.rb)" when @ lineno=2048 is "AAAA1234567890AAAAA1234567890".
I am also wondering if changes are needed around the adjust_last_keep method.
@kou
Could you please explain the role of this method?
from csv.
Sure.
#adjust_last_keep
was introduced for fixing https://bugs.ruby-lang.org/issues/18245 .
InputsScanner
acts as logically one StringScanner
with multiple inputs. (StringScanner
can't work with multiple strings.)
CSV::Parser
may want to push back read data. For example, if skip_lines
is specified, CSV::Parser
may push back read data. CSV::Parser
reads a line from its scanner (CSV::Parser::Scanner
or CSV::Parser::InputsScanner
) to check whether the line should be skipped. If the read line isn't skip target, CSV::Parser
pushes back the read line and parses the line as a CSV line. keep_start
/keep_drop
/keep_back
/keep_end
are for it.
adjust_last_keep
is related to these keep_*
methods.InputsScanner
processes multiple inputs. So the target data (for example, one line for skip_lines
) may exist in multiple inputs. For example, "# a"
, "bc"
and "\n"
are one line but they are 3 inputs. adjust_last_keep
is for the situation. If we need to concatenate data from multiple inputs, adjust_last_keep
does it.
I hope that this explanation helps you.
from csv.
Thank you for your detailed explanation!
I will refer to this and continue the investigation.
from csv.
Including line number in line contents will helpful:
File.open('/tmp/test.csv', 'w') do |f|
lines = 2500.times.collect do |i|
"A%013d" % i
end
f.print(lines.join("\r\n"))
end
Output with the test file:
...
A0000000002497
A0000000002498
A0000000002499A0000000002499
It seems that the last line was used twice.
from csv.
I cloud reproduce this with the script:
ENV["CSV_PARSER_SCANNER_TEST"] = "yes"
require "csv"
csv = CSV.new("a\r\nb", row_sep: "\r\n", strip: true, skip_lines: /\A *\z/)
csv.each do |row|
pp row
end
["a"]
["bb"]
from csv.
Related Issues (20)
- CSV parse does not honor field_size_limit option unless and until a comma occurs in the data, and field_size_limit is off by one HOT 3
- unknown encoding name - UTF-16:UTF-8 (ArgumentError) HOT 8
- Illegal quoting in line 1. (CSV::MalformedCSVError) but loads OK in LibreOffice/GoogleSheets HOT 8
- New bugfix version for the changes on master HOT 1
- CSV.generate is not working with Rails 7 HOT 18
- `CSV::Row` pattern matching `Symbol` assumption HOT 1
- :date_time converter fails to recognize "YYYY-MM-DD HH:MM" HOT 7
- Add quoted information to CSV::FieldInfo HOT 1
- ArgumentError: unknown encoding name - iso-8859-1|utf-8 HOT 2
- Feature Request: Generate CSV String from Array HOT 5
- #eof? method returning wrong value when it's used on a csv file HOT 3
- #eof? method returning wrong value when it's used on a csv file with #each, #map, #filter HOT 1
- row access method(like .first, .count, .map) remove row unintentionally HOT 2
- New release for Ruby 3.2 HOT 7
- How about GH releases generated by `gh release create --generate-notes` HOT 5
- feature: add option to limit length of strings HOT 3
- Suggestion to add `sep` option HOT 1
- Broken links in documention HOT 11
- Recipes not copied downstream HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.