Comments (4)
being able to set the line delimiter is handy when parsing FASTA files efficiently in bioinformatics.
It is can also be used to deal with output from some common unix tools (like the -print0 option of find for example).
It is not a huge need most of the time, but I personally miss it when it is absent. (The lack of a line delimiter option in python's readline makes it about 3x slower than perl for parsing a FASTA of the human genome) [1].
from readr.
Just to be clear, with FASTA, you'd want to use >
as a line delimiter? (And then you ignore new lines?)
from readr.
Just to be clear, with FASTA, you'd want to use > as a line delimiter? (And then you ignore new lines?)
With the added complexity that the first line for each record is the header, so that would take an additional step. Doing so lets you do 24 read calls vs ~60 million if you are reading with newline as the delimiter, which is where you lose most of the performance.
FASTA files are probably better read by a custom function anyway (of which there are a number), it was just an example of where I have appreciated the option.
I actually think parsing null delimited strings i.e. find -print0
a more persuasive argument for having the option, it is very handy when you are dealing with filenames containing embedded whitespace. However playing around with it a bit this morning it looks like R errors with Error: embedded nul in string: '\0'
if you try to construct a string with a null byte so that use case does not even seem possible. :(
from readr.
I think it makes sense to keep it as on - it's not terribly hard to add, and it fits in nicely with the system of parsing specifications that I'm developing.
from readr.
Related Issues (20)
- Undocumented `name_repair = "unique_quiet"` HOT 1
- Document parse_factor produces different levels than base factor
- overzealous guessing/parsing for the "number" format based on grouping marks?
- Confusing error for non-ascii input HOT 2
- Reading a number with the incorrect decimal mark does not fail
- `read_tsv()` gives problems on gzipped file, not when uncompressed
- row column wrong in problems() with read_fwf
- FR: Option to load missing reasons / codes as separate columns HOT 1
- type_convert() does not parse IEEE 754 double values (NaN, Inf, -Inf)
- write_csv freezes/fails when writing to many files in a short amount of time.
- CRAN warnings in r-devel-windows-x86_64 HOT 4
- Release readr 2.1.5
- FR: Make `write_*()` return `file`
- Reconsider/remove message after `read_*()` HOT 1
- `spec_csv` errors on one-line literal data HOT 1
- read_delim parsing issue with compressed file
- A more informative error when the user accidentally spreads `levels` in `col_factor(...)`
- type_convert ignores leading whitespace when imputing column type, and also imputes before trimming
- read_tsv() unexpectedly slow for tables with a large number of columns HOT 1
- reading in dirty csv's with extra quote's should give a warning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from readr.