Code Monkey home page Code Monkey logo

bytelines's Introduction

bytelines

Build Status Crates.io

This library provides an easy way to read in input lines as byte slices for high efficiency. It's basically lines from the standard library, but it reads each line as a byte slice (&[u8]). This performs significantly faster than lines() in the case you don't particularly care about unicode, and basically as fast as writing the loops out by hand. Although the code itself is somewhat trivial, I've had to roll this in at least 4 tools I've written recently and so I figured it was time to have a convenience crate for it.

Installation

This tool will be available via Crates.io, so you can add it as a dependency in your Cargo.toml:

[dependencies]
bytelines = "2.5"

Usage

It's quite simple; in the place you would typically call lines on a BufRead implementor, you can now use bytelines to retrieve a structure used to walk over lines as &[u8] (and thus avoid allocations). There are two ways to use the API, and both are shown below:

// our input file we're going to walk over lines of, and our reader
let file = File::open("./my-input.txt").expect("able to open file");
let reader = BufReader::new(file);
let mut lines = ByteLines::new(reader);

// Option 1: Walk using a `while` loop.
//
// This is the most performant option, as it avoids an allocation by
// simply referencing bytes inside the reading structure. This means
// that there's no copying at all, until the developer chooses to.
while let Some(line) = lines.next() {
    // do something with the line
}

// Option 2: Use the `Iterator` trait.
//
// This is more idiomatic, but requires allocating each line into
// an owned `Vec` to avoid potential memory safety issues. Although
// there is an allocation here, the overhead should be negligible
// except in cases where performance is paramount.
for line in lines.into_iter() {
    // do something with the line
}

As of v2.3 this crate includes fairly minimal support for Tokio, namely the AsyncBufRead trait. This looks fairly similar to the base APIs, and can be used in much the same way.

// configure our inputs again, using `AsyncByteLines`.
let file = File::open("./my-input.txt").await?;
let reader = BufReader::new(file);
let mut lines = AsyncByteLines::new(reader);

// walk through all lines using a `while` loop
while let Some(line) = lines.next().await? {
    // do something with the line
}

// walk through all lines using `Stream` APIs
lines.into_stream().for_each(|line| {

});

The main difference is that the Tokio implementations yield Result<Option<&[u8]>, _> instead of Option<Result<&[u8], _>> for consistency with the exiting Tokio APIs. If you don't want Tokio support, please disable default features:

[dependencies]
bytelines = { version = "2.5", default-features = false }

This will be removed as a default feature in the next major bump (v3.0), but for now you can exclude it this way.

bytelines's People

Contributors

dandavison avatar jguhlin avatar whitfin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bytelines's Issues

empty line without \r causes panic

use std::io;
extern crate bytelines;
use bytelines::*;

fn main() {
    let stdin = io::stdin();
    for _ in stdin.lock().byte_lines() {}
}
$ echo | cargo run
thread 'main' panicked at 'attempt to subtract with overflow', …/bytelines-2.2.1/src/lib.rs:129:36

And the line that panics is:

// also "pop" a leading \r
if self.buffer[n - 1] == b'\r' {

Add LICENSE file

There are many MIT variants out there and it would be very helpful if you would put specific one into the LICENSE file and release new version. This is basically requirement for packaging this crate in Fedora.

Add other runtimes, remove Tokio from default features

There are many other popular runtimes besides Tokio, we should add modules for these runtimes too. The two in mind are smol and async_std but there are probably others.

At the same time all of these runtimes should be off by default, and opt-in via features in the dependencies. Obviously this will all be a major bump, so we can consider any changes in APIs here too.

Add iterator that includes offset

It would be great if there would be a function with identical usage, but that also includes the byte offset in the iterator item. I need this because i need to remove some characters from a specific line, but cant because the byte offset i have (from .take(n).fold(0, |count, v| count + v.len())) would not take into consideration the bytes of the actual newlines characters.

Documentation contains incorrect types

Apologies if I'm not understanding some Rust conventions but it seems to me that that the docs are misleading about the types yielded when iterating over lines. E.g.

// do something with the line, which is &[u8]

whereas really (AIUI) it's Result<&[u8], Error>. The same is true of several of the comments (and thus in the official docs on crates.io).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.