Code Monkey home page Code Monkey logo

Comments (8)

kou avatar kou commented on July 29, 2024

"" is processed as escaped " like "\"" in Ruby's String.
This is a common convention in CSV. See also RFC 4180: https://datatracker.ietf.org/doc/html/rfc4180#section-2

   7.  If double-quotes are used to enclose fields, then a double-quote
       appearing inside a field must be escaped by preceding it with
       another double quote.  For example:

       "aaa","b""bb","ccc"

from csv.

mreinsch avatar mreinsch commented on July 29, 2024

@kou thanks for looking into this. Makes sense.

I suppose a way forward could be to provide an option to tell the parser that no escaping is being used in the data. I could look into that if that's something you'd be open to add.

Though I'm actually thinking that in my case it'd be easier to implement a TSV parser as tabs aren't allowed in values... but it'd be nice to provide a common way to deal with such cases.

from csv.

kou avatar kou commented on July 29, 2024

Can we use col_sep: "\t"?

from csv.

mreinsch avatar mreinsch commented on July 29, 2024

@kou not sure what you mean, the problem is the same with col_sep: "\t". In that case the \t becomes part of the string.

from csv.

kou avatar kou commented on July 29, 2024

I thought that your original data uses \t as a separator and all columns don't include \t:

The original source is actually using tabs to separate the columns (which aren't allowed in the data)

from csv.

mreinsch avatar mreinsch commented on July 29, 2024

Yes, that's correct. But as CSV parser is converting "" to ", the same issue happens:

CSV.parse_line(%Q{"size: 2" "\t'time: 3''\t"test"}, col_sep: "\t", liberal_parsing: true)
=> ["\"size: 2\" \"", "'time: 3''", "test"]

CSV.parse_line(%Q{"size: 2""\t'time: 3''\t"test"}, col_sep: "\t", liberal_parsing: true)
=> ["\"size: 2\"\t'time: 3''\t\"test\""]

Are you suggesting to adjust the CSV parser to behave differently if tab is used as col_sep?

As I mentioned, I could probably just split the string on "\t" and hence not use the CSV parser at all / implement a simplified TSV parser. I was more wondering if you'd consider adjusting the CSV parser to handle this.

from csv.

kou avatar kou commented on July 29, 2024

Ah, you just used CSV not TSV as an example to show your use case, right?

for some reason decided to quote each value

Is it your choice for parsing? Or is it decided by a person who created the original source?
(It seems that the first and third column are quoted by " and the second column is quoted ' not ". Is it intentional?)

If it's your choice, how about just stopping it and using quote_char: nil?

pp CSV.parse_line(%Q{size: 2" \ttime: 3'\ttest}, col_sep: "\t", quote_char: nil)
# ["size: 2\" ", "time: 3'", "test"]
pp CSV.parse_line(%Q{size: 2"\ttime: 3'\ttest}, col_sep: "\t", quote_char: nil)
# ["size: 2\"", "time: 3'", "test"]

from csv.

mreinsch avatar mreinsch commented on July 29, 2024

@kou right, I suppose I used the quote_char so I don't need to strip the quotation marks afterwards, but that should be easy to do. Thanks for your help!

from csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.