Comments (9)
I'm using the version included in ruby 2.6.5, btw!
from csv.
The stack trace indicates that it's failing in parse_quotable_robust
. I'm thinking the problem may be that in prepare_unquoted
, the @unquoted_value regexp
does not take into account the value of row_sep. Therefore, the |
is consumed by the scanner as part of the unquoted field value. After this, the "|\n"
no longer exists to denote a line break, and instead the \n is interpreted as the next character of this field.
from csv.
Why do you want to use "|\n"
?
Could you show a real world use case?
from csv.
We're using this because we're interfacing with other systems that have very poor support for CSV, and get confused by newlines in quoted fields. That said, these files are created and then transformed by our software before being passed off to the other software (various RDBMSes) so during that first, internal phase, we could just use the default row/column separators, and avoid needing to parse the non-standard format.
If you don't want to support it, that's ok but an error should probably be raised when reading with non-standard row/col separators.
from csv.
Could you show the real CSV that has problem?
The CSV does NOT use "|\n"
as row separator, right?
from csv.
The CSV was using "|\n" as a row separator.
This CSV is used for an export/import process that involves creating CSV files, transforming them, and then finally importing them into MySQL, Postgres, Oracle, etc. We use a "|\n" separator in the final CSV files to overcome very poor CSV support in some of these databases.
It used to be the case that the initial CSV files were written with the "|\n" row separator, which meant that we had to support it when reading the CSV files for the transformation & filtering step. The generated CSV was not any more complicated than what was produced in the initial example I provided.
After discovering this issue, I have modified our system to only use the "|\n" separator in the final step - right before the file is imported into the other SQL databases.
from csv.
Ah, sorry. I want to know the input of your transformation system. I think that the following is a situation of this case:
X -CSV(X)-> [YOUR SYSTEM] -CSV(RDBMS)-> RDBMS
I want to know a sample content of CSV(X)
.
from csv.
The whole process starts with our database, so the situation is more like:
step | owner | system | name | read row_sep | write row_sep |
---|---|---|---|---|---|
1 | me | ruby (rails) | DB to CSV (exporting) | "|\n" |
|
2 | me | ruby | CSV to CSV (filtering) | "|\n" |
"|\n" |
3 | me | ruby | CSV to CSV-like (transforming) | "|\n" |
various things, including "|\n" |
4 | not me | RDBMS | CSV-like to DB | various things, including "|\n" |
- the data is in our database
- we export that data into many CSV files
- those CSV files are transformed by removing rows and columns
- those CSV files are transformed into an alternate CSV format, depending on what RDBMS they will be loaded into
- they're loaded into the RDBMSes
Steps 1-2 happen daily, steps 3,4,5 are repeated many times with the output of steps 1-2.
The system has been modified to now work like this:
step | system | name | read row_sep | write row_sep |
---|---|---|---|---|
1 | ruby (rails) | DB to CSV (exporting) | "\n" |
|
2 | ruby | CSV to CSV (filtering) | "\n" |
"\n" |
3 | ruby | CSV to CSV-like (transforming) | "\n" |
various things, including "|\n" |
4 | RDBMS | CSV-like to DB | various things, including "|\n" |
With this change, we no longer need to read CSV files with row_sep: "|\n"
, which means this bug no longer affects us.
The reason we were reading with that row separator is because it is convenient to use the same row separator from start to end of the process. It's OK to have to use standard CSV in steps 1 and 2. The actual data we are processing is not relevant to the process or the usage of that row separator.
Therefore, in my opinion, the main bug is that you can set anything you want as row_sep
during reading, but the reader will fail to read files with non-standard row_sep values.
from csv.
Thanks. I understand.
I'll think about how to deal with this case.
from csv.
Related Issues (20)
- ArgumentError: unknown encoding name - iso-8859-1|utf-8 HOT 2
- Feature Request: Generate CSV String from Array HOT 5
- #eof? method returning wrong value when it's used on a csv file HOT 3
- #eof? method returning wrong value when it's used on a csv file with #each, #map, #filter HOT 1
- row access method(like .first, .count, .map) remove row unintentionally HOT 2
- New release for Ruby 3.2 HOT 7
- How about GH releases generated by `gh release create --generate-notes` HOT 5
- feature: add option to limit length of strings HOT 3
- Suggestion to add `sep` option HOT 1
- Broken links in documention HOT 11
- Duplicated last line in CSV.foreach HOT 14
- Recipes not copied downstream HOT 6
- A fiber to execute ':heder_converters' has been changed since v3.2.6 HOT 1
- liberal parsing does not split column as expected with quote at end of column HOT 8
- Inconsistent behaviour between `CSV::Table.new` and `CSV.parse` HOT 10
- Loading line becomes strange when skip_lines is specified HOT 6
- Read StringIO with foreach HOT 2
- feature request: Warn when reading BOM text with headers option HOT 16
- Rename the default branch name to "main" from "master"
- CSV.filter(input, output, headers: true) does not write the header row to output! HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csv.