Code Monkey home page Code Monkey logo

Comments (4)

jimhester avatar jimhester commented on June 15, 2024

being able to set the line delimiter is handy when parsing FASTA files efficiently in bioinformatics.

It is can also be used to deal with output from some common unix tools (like the -print0 option of find for example).

It is not a huge need most of the time, but I personally miss it when it is absent. (The lack of a line delimiter option in python's readline makes it about 3x slower than perl for parsing a FASTA of the human genome) [1].

from readr.

hadley avatar hadley commented on June 15, 2024

Just to be clear, with FASTA, you'd want to use > as a line delimiter? (And then you ignore new lines?)

from readr.

jimhester avatar jimhester commented on June 15, 2024

Just to be clear, with FASTA, you'd want to use > as a line delimiter? (And then you ignore new lines?)

With the added complexity that the first line for each record is the header, so that would take an additional step. Doing so lets you do 24 read calls vs ~60 million if you are reading with newline as the delimiter, which is where you lose most of the performance.

FASTA files are probably better read by a custom function anyway (of which there are a number), it was just an example of where I have appreciated the option.

I actually think parsing null delimited strings i.e. find -print0 a more persuasive argument for having the option, it is very handy when you are dealing with filenames containing embedded whitespace. However playing around with it a bit this morning it looks like R errors with Error: embedded nul in string: '\0' if you try to construct a string with a null byte so that use case does not even seem possible. :(

from readr.

hadley avatar hadley commented on June 15, 2024

I think it makes sense to keep it as on - it's not terribly hard to add, and it fits in nicely with the system of parsing specifications that I'm developing.

from readr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.