Code Monkey home page Code Monkey logo

encoding_rs_io's Introduction

encoding_rs_io

This crate provides streaming adapters for the encoding_rs crate. Adapters implement the standard library I/O traits and provide streaming transcoding support.

Linux build status Windows build status

Documentation

https://docs.rs/encoding_rs_io

Usage

Add this to your Cargo.toml:

[dependencies]
encoding_rs_io = "0.1"

and this to your crate root:

extern crate encoding_rs_io;

Example

This example shows how to create a decoder that transcodes UTF-16LE (the source, indicated by a BOM) to UTF-8 (the destination).

extern crate encoding_rs;
extern crate encoding_rs_io;

use std::error::Error;
use std::io::Read;

use encoding_rs_io::DecodeReaderBytes;

fn main() {
    example().unwrap();
}

fn example() -> Result<(), Box<Error>> {
    let source_data = &b"\xFF\xFEf\x00o\x00o\x00b\x00a\x00r\x00"[..];
    // N.B. `source_data` can be any arbitrary io::Read implementation.
    let mut decoder = DecodeReaderBytes::new(source_data);

    let mut dest = String::new();
    // decoder implements the io::Read trait, so it can easily be plugged
    // into any consumer expecting an arbitrary reader.
    decoder.read_to_string(&mut dest)?;
    assert_eq!(dest, "foobar");
    Ok(())
}

Future work

Currently, this crate only provides a way to get possibly valid UTF-8 from some source encoding. There are other transformations that may be useful that we could include in this crate. Namely:

  • An encoder that accepts an arbitrary std::io::Write implementation and takes valid UTF-8 and transcodes it to a selected destination encoding. This encoder would implement std::fmt::Write.
  • A decoder that accepts an arbitrary std::fmt::Write implementation and takes arbitrary bytes and transcodes them from a selected source encoding to valid UTF-8. This decoder would implement std::io::Write.
  • An encoder that accepts an arbitrary UnicodeRead implementation and takes valid UTF-8 and transcodes it to a selected destination encoding. This encoder would implement std::io::Read.
  • A decoder that accepts an arbitrary std::io::Read implementation and takes arbitrary bytes and transcodes them from a selected source encoding to valid UTF-8. This decoder would implement the UnicodeRead trait.

Where UnicodeRead is a hypothetical trait that does not yet exist. Its definition might look something like this:

trait UnicodeRead {
    fn read(&mut self, buf: &mut str) -> Result<usize>;
}

Interestingly, of the above transformations, none of them correspond to DecodeReaderBytes. Namely, DecodeReaderBytes most closely corresponds to the last option, but instead of guaranteeing valid UTF-8 by implementing a trait like UnicodeRead, it instead implements std::io::Read, which pushes UTF-8 handling on to the caller. However, it turns out that this particular use case is important for operations like search, which can often be written in a way that don't assume UTF-8 validity but still benefit from it.

It's not clear which of the above transformations is actually useful, but all of them could theoretically exist. There is more discussion on this topic here (and in particular, the above formulation was taken almost verbatim from Simon Sapin's comments): hsivonen/encoding_rs#8

It is also perhaps worth stating that this crate very much intends on remaining coupled to encoding_rs, which helps restrict the scope, but may be too biased toward Web oriented encoding to solve grander encoding challenges. As such, it may very well be that this crate is actually a stepping stone to something with a larger scope. But first, we must learn.

License

This project is licensed under either of

at your option.

encoding_rs_io's People

Contributors

burntsushi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.