burntsushi / byteorder Goto Github PK
View Code? Open in Web Editor NEWRust library for reading/writing numbers in big-endian and little-endian.
License: The Unlicense
Rust library for reading/writing numbers in big-endian and little-endian.
License: The Unlicense
I updated to recent nightly and I cannot build byteorder anymore because of mismatched types.
b.rs:278:52: 278:57 error: mismatched types:
expected `*mut u8`,
found `*const u8`
(values differ in mutability) [E0308]
/home/jp/.cargo/registry/src/github.com-1ecc6299db9ec823/byteorder-0.3.3/src/lib.rs:278 copy_nonoverlapping($dst.as_mut_ptr(), bytes, $size);
Looks like it's just a one-liner
While this sounds redundant because the write methods are designed for fixed-width types it would allow the caller to more easily compute the amount of data written. As an example use case, consider sending a small packet of data over a UDP socket. Something like:
let mut buf = [0u8; 16];
let mut n = 0;
{
let mut packet = &mut buf[..];
n += try!(packet.write_u16::(1u16));
n += try!(packet.write_u16::(2u16));
n += try!(packet.write_u16::(3u16));
}
let sock = try!(std::net::UdpSocket::bind("0.0.0.0:0"));
try!(sock.send_to(&buf[..n], "127.0.0.1:1234"));
It gets harder to determine how many bytes may have been written for increasingly complicated packets (perhaps with conditional/variable length fields). Or if new fields are added in the future there is less risk of the sent buffer size not getting updated to match. Writing into a vector would also provide the necessary book keeping but requires an allocation from the heap. This would also bring the API closer in line with the std::io::Write's write signature.
Is there much of a performance hit in returning a hard-coded usize value instead of the empty tuple?
extern crate byteorder;
fn main() {
byteorder::LittleEndian::default();
}
thread 'main' panicked at 'internal error: entered unreachable code'
Unreachable doesn't mean "please don't call this," it means "you can't get here." I think it is okay for this method to panic but it should be something other than unreachable. A handwritten message explaining why this method exists would be nice.
cc @fitzgen
@Tobba and I have worked on https://github.com/QuiltOS/core_io, a copy of Read and Write but with an associated error type to make it just need core. Perhaps it would be nice to (optionally) extend these traits for no_std users?
byteorder
is very heavily used, but its API has mostly remained the same since it was first released (which was inspired by both Go's encoding/binary
package and the pre-existing methods in Rust's old standard library that fulfilled a similar role). There was however significant discussion on its API in #27, but I feel that no consensus has been reached and I don't think there's an obviously better API given Rust in its current form. Therefore, I'd like to propose that we cut a 1.0 release in the next few weeks.
I think the only outstanding issue that we should try to resolve before 1.0 is #52.
Several types do not directly contain examples, nor do many methods. It's not clear how much example coverage we need, the policy in std though is for everything to have an example.
This is the tracking issue for the evaluation performed by the libs team last week.
My Cargo.toml file looks like this:
[package]
name = "lambda-db"
version = "0.0.1"
authors = ["asdasd < [email protected]>"]
[dependencies]
bincode = "*"
I'm using the lastest rustc compile:
rustc 1.0.0-nightly (b0aad7dd4 2015-03-22) (built 2015-03-23)
However I am getting build errors as it's missing undeclared trait names - Reader and Writer.
error: use of undeclared trait name `Writer`
/home/pez/.cargo/registry/src/github.com-1ecc6299db9ec823/byteorder-0.2.14/src/old.rs:225 impl<W: Writer> WriterBytesExt for W {}
I'm very new to rust otherwise I would fix this issue myself. Is it possible that I am using the wrong version of rust ?
$ rustc -V
rustc 1.0.0-beta (built 2015-04-04)
$ cargo build
Updating registry `https://github.com/rust-lang/crates.io-index`
Downloading quickcheck v0.2.14
Compiling byteorder v0.3.8 (file:///Users/zimbatm/code/github.com/BurntSushi/byteorder)
src/lib.rs:87:1: 201:2 error: parameter `Self` is never used
src/lib.rs:87 pub trait ByteOrder {
src/lib.rs:88 /// Reads an unsigned 16 bit integer from `buf`.
src/lib.rs:89 ///
src/lib.rs:90 /// Task failure occurs when `buf.len() < 2`.
src/lib.rs:91 fn read_u16(buf: &[u8]) -> u16;
src/lib.rs:92
...
src/lib.rs:201:2: 201:2 help: consider removing `Self` or using a marker such as `core::marker::PhantomFn`
error: aborting due to previous error
Could not compile `byteorder`.
To learn more, run the command again with --verbose.
As I understand, the purpose of this crate is to prepare for a world where we no longer have the endian-writing/reading functions on Reader and Writer. As someone that that uses those functions a lot, I would like to prepare my crates (namely bincode) for the new io
crate. I'd be willing to help do the port if you are interested.
It makes me a little sad to see unsafe
being used to convert a &[u32]
into a &[u8]
in octavo:
fn crypt(&mut self, input: &[u8], output: &mut [u8]) {
assert_eq!(input.len(), output.len());
if self.index == STATE_BYTES { self.update() }
let buffer = unsafe {
slice::from_raw_parts(self.buffer.as_ptr() as *const u8, STATE_BYTES)
};
for i in self.index..input.len() {
output[i] = input[i] ^ buffer[i];
}
self.index = input.len();
}
We really ought to have a place to centralize this functionality so that it's well tested and safe across our ecosystem. Would it make sense to have this functionality be in byteorder?
It'd also be interesting to also support the inverse operation, where a &[u8]
is converted into a (&[u8], &[u32], &[u8])
, where the first and last slice are there to read a byte-at-a-time until the the slice is aligned. This style operation could be useful to safely massage a slice into something that can use simd (or at least simd-ish operations over a usize value).
write_*
methods of ByteOrder
trait accept a buffer and don't guarantee that they wouldn't read from it. This has a drawback that strictly speaking, the provided buffer shouldn't be uninitialized.
I suggest to provide some way of guaranteeing that the buffer won't be read from, so it's fine to pass uninitialized buffer.
Using ReadBytesExt
is very noisy due to having to specify the endianness in every call:
let foo = reader.read_u16::<BigEndian>()?;
let bar = reader.read_u16::<BigEndian>()?;
let baz = reader.read_u32::<BigEndian>()?;
It would be nice if there was a reader adapter which stored the endianness to DRY this up:
pub struct ReadBytes<R, E> {
inner: R,
endianness: PhantomData<E>
}
impl<R, E> ReadBytes where R: Read, E: ByteOrder {
// Alternately place this as an adapter method on `ReadBytesExt`
pub fn new(inner: R) -> Self {...}
// Duplicate methods of `ReadBytesExt` without the endianness parameter
}
Of course, all of this applies to WriteBytesExt
as well.
From a discussion on #rust-beginners
there seems to be a need for ByteOrder
implementation that dispatches to LE/BE based on runtime information.
Essentially something like this:
enum Endianess { Little, Big }
impl ByteOrder for Endianess {
// boilerplate methods with `match` that dispatch to LE or BE
}
let endianess = get_endianess_at_runtime();
endianess.read_i32(&some_bytes);
The byteorder
docs don't seem to say that the crate is focused solely on static/type-level checking, so I'm guessing this would be in scope for the library.
Of course this isn't strictly necessary, as you can probably just write reading/writing code generically and simply move the LE
/BE
decision to a higher level, but it may simplify some use cases regardless.
It currently suggests the following:
[dependencies]
byteorder = "*"
In semantic versioning the major version number indicates breaking changes, so this pattern could cause unexpected breaks in downstream libraries when 1.0 comes around. I would suggest the following:
[dependencies]
byteorder = "0.*"
Which still uses a wildcard so it's easy to keep up-to-date.
It seems like they are really missing!
The 1.1.0 release introduced slice methods for the ReadBytesExt
trait (read_*_into
). It would be useful to have the corresponding write methods for the WriteBytesExt
trait.
Please add support for the 24 byte int (little, big, signed, unsigned)
In most cases, this is how i24s are implemented:
0x00
, 0x00
, 0x00
, 0x00
)0x00
, 0x00
, 0x00
0x00
, 0x00
, 0x00
and just pad it with another null byte then read it as an int (0x00
, 0x00
, 0x00
, 0x00
)Current implementation uses &[u8]
for reading and &mut [u8]
for writing, which panics if lengths are invalid. I suggest to implement alternative functions like read_arr_u16(but: &[u8; 2])
(name subject to bike shedding), that would take reference to array.
This would come handy in combination with constant generics, where code could guarantee absence of panics (due to programmer error) and elision of (unnecessary) checks.
Once constant generics are in Rust, the old (&[u8]
) versions could be implemented using new ones. I guess this could be done with macros.
I use byteorder in my rawloader crate to read binary files with specific orderings. For example here's the code to read 16bit unpacked little-endian:
On little-endian machines this could just be a memcpy. Currently rustc isn't able to optimize loops into memcpy but maybe it would make sense to have an explicit API for this inside byteorder, the same way copy_from_slice already allows that in standard types. That way I could call a LittleEndian::read_u16s(from, to, n) function in byteorder and if the architecture matches it does a memcpy, if not does a loop or even uses a BSWAP instruction. Would that make sense as part of the scope for this crate?
I'm frequently in need of accessing individual Bytes of an integer (be it when writing emulators or implementing network protocols), and I thought this functionality would be a good addition to the byteorder crate (since it is inherently dependent on the byte order). One way it could work is by defining extension traits for all integer types that look like this:
trait U32Ext {
fn bytes<B: ByteOrder>(self) -> [u8; 4];
}
It can be used like this:
let value = 123456i32;
let msb = value.bytes::<BigEndian>()[0];
let lsb = value.bytes::<BigEndian>()[3];
// These are equivalent:
let msb = value.bytes::<LittleEndian>()[3];
let lsb = value.bytes::<LittleEndian>()[0];
Of course, the methods must be marked #[inline]
for reasonable performance (I've confirmed that these expressions are optimized properly, so only the computation for the accessed byte remains).
Thoughts?
Sometimes it is impossible to statically determine required endianness in advance. For example, TIFF image format defines endianness in the first byte of an input file, so it may be either big or little but which exactly is unknown statically. It would be nice if I could use byteorder
for this task too.
Support usize because it's not possible to write a usize without something like this:
let length = "Hello World!".len();
let mut wtr = vec![];
wtr.write_uint::<BigEndian>(length, mem::size_of::<usize>()).unwrap();
When I follow the documentation link on crates.io or on the README, the documentation is not in sync with the repo/crates.io.
For example, I was looking at the Result
returned by ReadBytesExt::read_u32()
, which is different in 0.5.0.
If there is no problem with this, then i can steal implementation from old_io
and fill a pull request.
u128
i128
support?
Upstream tracking issue: rust-lang/rust#35118
It would be nice if we could change our git dependencies to version dependencies on crates.io
I think the APIs should return io::Result
instead of byteorder::Result
. It would be much easier to migrate those libs who depend on old_io to new io.
I suggest the UnexpectedEOF
error could be put into the io::Error
with io::ErrorKind::Other
kind.
I could submit a PR.
As of this moment, the read_f32
and read_f64
methods will bitcast any sequence of 4
/8
bytes to a f32
/f64
and will never fail, even if the resulting float is a "signaling NaN." In particular, materializing signaling NaNs appear to be undefined behavior, although the topic is pretty cloudy. See rust-lang/rust#39271 for more details.
I think what this means is that these functions need to be modified to return a Result
and probably a custom error type as well, although it would be nice to just use the error type that we end up with from rust-lang/rust#39271.
This is a pretty major breaking change, so it will require a 2.0
release.
There is some confusion (at least in my head) around whether signaling NaN's are actually unsafe or not.
Like
for word in file.iter_u16<byteorder::LittleEndian>() {
...
}
The documentation at http://burntsushi.net/rustdoc/byteorder seems to be outdated. The biggest issue that really confused me was that the Error type was removed (http://burntsushi.net/rustdoc/byteorder/enum.Error.html).
Please consider adding NativeEndian
byte order which corresponds to the byte order used on the host system.
While NativeEndian
can be dangerous for cross-platform compatibility, sometimes it can be necessary. For example, some (poorly defined) file formats may use platform endianness for multi-byte numbers.
pub trait ByteOrder
: Clone + Copy + Debug + Default + Eq + Hash + Ord + PartialEq + PartialOrd {
I understand why these exist (#60) but I am worried about the situation if Rust were to add another trait with a builtin derive not on this list. Our options would be to add it to this list which would require a 2.0 release of byteorder, or not add it to this list which means the new derive won't work nicely with structs parameterized over byteorder.
I think we should take steps to prevent extending this list being a breaking change. In serde_json we use a trick like this in a similar situation:
mod private {
pub trait Sealed {}
impl Sealed for super::LittleEndian {}
impl Sealed for super::BigEndian {}
}
/// This trait is sealed and cannot be implemented for types outside of byteorder.
pub trait ByteOrder
: Clone + Copy + Debug + ... + private::Sealed {
Now we are free to add supertraits and even trait methods (u128 i128 #65) without a major release.
The selling point of byteorder is that you can read and write little- / big-endian numbers, not that you can define your own wild and crazy byte orders, so I think it is reasonable to limit implementations to within the byteorder crate.
cc @fitzgen
Cannot build or run cargo test on 0.3.8
:
$ cargo test
Compiling byteorder v0.3.8 (file:///Users/xavierlange/code/rust/byteorder)
Compiling libc v0.1.6
src/lib.rs:87:1: 201:2 error: parameter `Self` is never used
src/lib.rs:87 pub trait ByteOrder {
src/lib.rs:88 /// Reads an unsigned 16 bit integer from `buf`.
src/lib.rs:89 ///
src/lib.rs:90 /// Task failure occurs when `buf.len() < 2`.
src/lib.rs:91 fn read_u16(buf: &[u8]) -> u16;
src/lib.rs:92
...
src/lib.rs:201:2: 201:2 help: consider removing `Self` or using a marker such as `core::marker::PhantomFn`
error: aborting due to previous error
Build failed, waiting for other jobs to finish...
Could not compile `byteorder`.
Running rustc
from mac homebrew:
$ rustc --version
rustc 1.0.0-beta (built 2015-04-03)
$ brew info rust
rust: stable 1.0.0-beta (bottled), HEAD
http://www.rust-lang.org/
/usr/local/Cellar/rust/1.0.0-beta (5619 files, 274M) *
Poured from bottle
From: https://github.com/Homebrew/homebrew/blob/master/Library/Formula/rust.rb
Currently, it looks like the byte order must be selected in the code. It would be nice to have:
enum Endian {
Little,
Big,
}
Or something like that.
At the moment the byteorder crate uses a enum to generalize over the endianness and the method name to select the type. Both is not very ergonomic to use. I propose the following interface:
trait WriteBytesExt<T> {
fn write_le(&mut self, n: T) -> io::Result<()>;
fn write_be(&mut self, n: T) -> io::Result<()>;
}
impl<W> WriteBytesExt<u8> for W where W: Write {
fn write_le(&mut self, n: u8) -> io::Result<()> {
....
}
....
}
First of all it gets rid of the enum. Since the enum is purely a compile time parameter it cannot be used for dynamic dispatch. This is as good or bad as having it directly the method name. Thus I do not see the point of having it. Secondly it gets rid of the redundant type name in the signature.
This shortens the method call significantly
w.write_u16::<LittleEndian>(42u16)
becomes
w.write_le(42u16)
My two points are:
*BytesExt
traits cannot be use to write generic code that abstracts over endianness. Again no benefit for the user.Can you release a new version with the change from #9?
Most docs tells function will panic if buffer len < value:
/// Panics when `buf.len() < 8`.
#[inline]
fn read_i64(buf: &[u8]) -> i64 {
but on macro to read and write bytes it is comparing with <=
macro_rules! read_num_bytes {
($ty:ty, $size:expr, $src:expr, $which:ident) => ({
assert!($size == ::core::mem::size_of::<$ty>());
assert!($size <= $src.len());
should it be assert!($size < $src.len()) ? or the docs should be <= ?
(edit.: never mind it is comparing value with length on macro
Some protocols specify 3 byte numbers (eg u24 in rfc5246), and it would be really handy if byteorder supported a read_u24
function out of the box.
Write now information about panic and error conditions is written inline with the main text. Per conventions they should be in sections titled "Panics" and "Errors".
Right now, making something compile-time generic over endianness is a huge pain:
extern crate byteorder;
use byteorder::{ByteOrder, LittleEndian};
use std::marker::PhantomData;
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
struct EndianBuf<'a, Endian>(&'a [u8], PhantomData<Endian>) where Endian: ByteOrder;
impl<'a, Endian> EndianBuf<'a, Endian> where Endian: ByteOrder {
fn new(buf: &'a [u8]) -> EndianBuf<'a, Endian> {
EndianBuf(buf, PhantomData)
}
}
fn main() {
let buf = [1,2,3,4,5,6];
let a = EndianBuf::<LittleEndian>::new(&buf);
let b = EndianBuf::<LittleEndian>::new(&buf);
// Compiler error regarding this line!
assert_eq!(a, b);
}
If you do rustc -Z unstable-options --pretty expanded endian_buf.rs
, you will see this:
#[automatically_derived]
#[allow(unused_qualifications)]
impl <'a, Endian: ::std::clone::Clone> ::std::clone::Clone for EndianBuf<'a, Endian>
where Endian: ByteOrder
{
// ...snip...
}
The derived impls require that the type traits also implement the trait.
If ByteOrder
implied Eq
, Clone
, Copy
, etc... the problems would go away. Arguably, this is a bug in the #[derive(Foo)]
expansion code for not understanding PhantomData
, but adding this would work around that bug.
Since one can't even instantiate BigEndian
or LittleEndian
, all the impls could be unreachable!()
.
Would you be willing to accept a patch that does this?
I can't write my converter from byteorder::new::Error
, i.e.:
impl FromError<byteorder::new::Error> for MyError { .. }
Therefore converting errors becomes a pain, and I must map_err
every time I'm reading primitives.
Thanks for the great work in creating this library! it has been working great for me. I do have one question though. How does one read arbitrarily amount of bytes? E.g. not a fixed size. For example I want to read 6 bytes and put it into an [u8;6] array.
Thanks,
Superhac
The byteorder crate is casting a u8 pointer to for example *const u64 and reading it. This requires that the pointer is well aligned for the new element type, otherwise this is UB.
Point it to docs.rs
Byteorder doesn't have any yet.
Is there any reason for it to exist?
Both io::Read and io::Write have no such bound.
I'd like to depend on this for an experiment where I'm rewriting a Python script which examines GIF files in a performance-optimized manner.
However because of flaws in the Unlicense, it's on my blacklist to ensure proper safety for my users, regardless of the jurisdiction they're in.
Is there any chance you'd be willing to offer byteorder under something more carefully designed like the Creative Commons CC0 public domain dedication?
(CC0 is also what the FSF recommends if you want to release your code into the public domain.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.