maciejhirsz / beef Goto Github PK
View Code? Open in Web Editor NEWFaster, more compact implementation of std::borrow::Cow
Home Page: https://crates.io/crates/beef
License: Apache License 2.0
Faster, more compact implementation of std::borrow::Cow
Home Page: https://crates.io/crates/beef
License: Apache License 2.0
As far as I can see, I'm not able to put own/std types into a beefcow except those that are explicitly enabled in this library. So would it be possible to add support for Path
/PathBuf
?
What are the thoughts of migrating from Travis to GitHub CI? It allows nice integrations like pull request blocking so silly people like me can't contribute memory leaks unnoticed :/ I'm happy to make a PR for that.
If i use the following code with std::Cow
- will be ok
let string = "hello".to_string();
let cow: Cow<str> = Cow::from(&string);
but beef::Cow
tells me:
error[E0277]: the trait bound `String: beef::traits::internal::Beef` is not satisfied
--> src\main.rs:6:35
|
6 | let cow: Cow<str> = Cow::from(&string);
| --------- ^^^^^^^ the trait `beef::traits::internal::Beef` is not implemented for `String`
| |
| required by a bound introduced by this call
|
= help: the trait `beef::traits::internal::Beef` is implemented for `str`
note: required by a bound in `beef::generic::Cow`
--> F:\Rust\beef\src\generic.rs:26:23
|
26 | pub struct Cow<'a, T: Beef + ?Sized + 'a, U: Capacity> {
| ^^^^ required by this bound in `beef::generic::Cow`
Maybe you need an explicit impl From<&str>
for Cow
in addition to From<&T>
?
Or is it not important?
As of Rust 1.65.0, many enums are now smaller, probably because of this new optimization:
rust-lang/rust#94075
So as of Rust 1.65.0, std::borrow::Cow
is 24 bytes in size, same as beef::Cow
(at least on 64-bit architectures, I haven't checked 32-bit architectures).
As discovered by Miri, the lean
variant of beef leaks memory. I verified that by adding this patch
diff --git a/src/generic.rs b/src/generic.rs
index add1a9b..9fa0981 100644
--- a/src/generic.rs
+++ b/src/generic.rs
@@ -170,6 +170,7 @@ where
#[inline]
fn drop(&mut self) {
if let Some(capacity) = self.capacity() {
+ panic!("dropping!");
unsafe { T::owned_from_parts::<U>(self.inner, capacity) };
}
}
and then running cargo test lean
. Clearly that should drop the contents and thus panic at some point, but it does not.
just like:
check that Cow<'a, str> implements addition
#[cfg(not(no_global_oom_handling))]
#[stable(feature = "cow_add", since = "1.14.0")]
impl<'a> Add<&'a str> for Cow<'a, str> {
type Output = Cow<'a, str>;
#[inline]
fn add(mut self, rhs: &'a str) -> Self::Output {
self += rhs;
self
}
}
#[cfg(not(no_global_oom_handling))]
#[stable(feature = "cow_add", since = "1.14.0")]
impl<'a> Add<Cow<'a, str>> for Cow<'a, str> {
type Output = Cow<'a, str>;
#[inline]
fn add(mut self, rhs: Cow<'a, str>) -> Self::Output {
self += rhs;
self
}
}
#[cfg(not(no_global_oom_handling))]
#[stable(feature = "cow_add", since = "1.14.0")]
impl<'a> AddAssign<&'a str> for Cow<'a, str> {
fn add_assign(&mut self, rhs: &'a str) {
if self.is_empty() {
*self = Cow::Borrowed(rhs)
} else if !rhs.is_empty() {
if let Cow::Borrowed(lhs) = *self {
let mut s = String::with_capacity(lhs.len() + rhs.len());
s.push_str(lhs);
*self = Cow::Owned(s);
}
self.to_mut().push_str(rhs);
}
}
}
#[cfg(not(no_global_oom_handling))]
#[stable(feature = "cow_add", since = "1.14.0")]
impl<'a> AddAssign<Cow<'a, str>> for Cow<'a, str> {
fn add_assign(&mut self, rhs: Cow<'a, str>) {
if self.is_empty() {
*self = rhs
} else if !rhs.is_empty() {
if let Cow::Borrowed(lhs) = *self {
let mut s = String::with_capacity(lhs.len() + rhs.len());
s.push_str(lhs);
*self = Cow::Owned(s);
}
self.to_mut().push_str(&rhs);
}
}
}
use std::borrow::Cow;
// check that Cow<'a, str> implements addition
#[test]
fn check_cow_add_cow() {
let borrowed1 = Cow::Borrowed("Hello, ");
let borrowed2 = Cow::Borrowed("World!");
let borrow_empty = Cow::Borrowed("");
let owned1: Cow<'_, str> = Cow::Owned(String::from("Hi, "));
let owned2: Cow<'_, str> = Cow::Owned(String::from("Rustaceans!"));
let owned_empty: Cow<'_, str> = Cow::Owned(String::new());
assert_eq!("Hello, World!", borrowed1.clone() + borrowed2.clone());
assert_eq!("Hello, Rustaceans!", borrowed1.clone() + owned2.clone());
assert_eq!("Hi, World!", owned1.clone() + borrowed2.clone());
assert_eq!("Hi, Rustaceans!", owned1.clone() + owned2.clone());
if let Cow::Owned(_) = borrowed1.clone() + borrow_empty.clone() {
panic!("Adding empty strings to a borrow should note allocate");
}
if let Cow::Owned(_) = borrow_empty.clone() + borrowed1.clone() {
panic!("Adding empty strings to a borrow should note allocate");
}
if let Cow::Owned(_) = borrowed1.clone() + owned_empty.clone() {
panic!("Adding empty strings to a borrow should note allocate");
}
if let Cow::Owned(_) = owned_empty.clone() + borrowed1.clone() {
panic!("Adding empty strings to a borrow should note allocate");
}
}
#[test]
fn check_cow_add_str() {
let borrowed = Cow::Borrowed("Hello, ");
let borrow_empty = Cow::Borrowed("");
let owned: Cow<'_, str> = Cow::Owned(String::from("Hi, "));
let owned_empty: Cow<'_, str> = Cow::Owned(String::new());
assert_eq!("Hello, World!", borrowed.clone() + "World!");
assert_eq!("Hi, World!", owned.clone() + "World!");
if let Cow::Owned(_) = borrowed.clone() + "" {
panic!("Adding empty strings to a borrow should note allocate");
}
if let Cow::Owned(_) = borrow_empty.clone() + "Hello, " {
panic!("Adding empty strings to a borrow should note allocate");
}
if let Cow::Owned(_) = owned_empty.clone() + "Hello, " {
panic!("Adding empty strings to a borrow should note allocate");
}
}
#[test]
fn check_cow_add_assign_cow() {
let mut borrowed1 = Cow::Borrowed("Hello, ");
let borrowed2 = Cow::Borrowed("World!");
let borrow_empty = Cow::Borrowed("");
let mut owned1: Cow<'_, str> = Cow::Owned(String::from("Hi, "));
let owned2: Cow<'_, str> = Cow::Owned(String::from("Rustaceans!"));
let owned_empty: Cow<'_, str> = Cow::Owned(String::new());
let mut s = borrowed1.clone();
s += borrow_empty.clone();
assert_eq!("Hello, ", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
let mut s = borrow_empty.clone();
s += borrowed1.clone();
assert_eq!("Hello, ", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
let mut s = borrowed1.clone();
s += owned_empty.clone();
assert_eq!("Hello, ", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
let mut s = owned_empty.clone();
s += borrowed1.clone();
assert_eq!("Hello, ", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
owned1 += borrowed2;
borrowed1 += owned2;
assert_eq!("Hi, World!", owned1);
assert_eq!("Hello, Rustaceans!", borrowed1);
}
#[test]
fn check_cow_add_assign_str() {
let mut borrowed = Cow::Borrowed("Hello, ");
let borrow_empty = Cow::Borrowed("");
let mut owned: Cow<'_, str> = Cow::Owned(String::from("Hi, "));
let owned_empty: Cow<'_, str> = Cow::Owned(String::new());
let mut s = borrowed.clone();
s += "";
assert_eq!("Hello, ", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
let mut s = borrow_empty.clone();
s += "World!";
assert_eq!("World!", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
let mut s = owned_empty.clone();
s += "World!";
assert_eq!("World!", s);
if let Cow::Owned(_) = s {
panic!("Adding empty strings to a borrow should note allocate");
}
owned += "World!";
borrowed += "World!";
assert_eq!("Hi, World!", owned);
assert_eq!("Hello, World!", borrowed);
}
#[test]
fn check_cow_clone_from() {
let mut c1: Cow<'_, str> = Cow::Owned(String::with_capacity(25));
let s: String = "hi".to_string();
assert!(s.capacity() < 25);
let c2: Cow<'_, str> = Cow::Owned(s);
c1.clone_from(&c2);
assert!(c1.into_owned().capacity() >= 25);
let mut c3: Cow<'_, str> = Cow::Borrowed("bye");
c3.clone_from(&c2);
assert_eq!(c2, c3);
}
I think the impl for Send
on Cow
should be bounded by Send + Sync
or Sync
like it is for references and the std::Cow itself. (This issue was found by the rust group @sslab-gatech).
Line 531 in 0b46851
Without this it's possible to create references to a non-Sync
object like Cell
through the use of a borrowed Cow. For example, consider the following program (uses crossbeam_utils
to make dealing with threads easier):
#![forbid(unsafe_code)]
use crossbeam_utils::thread;
use std::cell::Cell;
use beef::Cow;
fn main() {
let x = [Cell::new(42)];
// A simple proof-of-concept demonstrating how a loose bound on Cow's
// Send trait allows non-`Sync` but `Send`able objects to be shared across
// threads.
thread::scope(|s| {
let cow1: Cow<[Cell<i32>]> = Cow::borrowed(&x[..]);
let cow2: Cow<[Cell<i32>]> = cow1.clone();
let child = s.spawn(|_| {
let smuggled = cow2.unwrap_borrowed();
smuggled[0].set(2);
// Print the value of the Cow-value and the address of the
// underlying Cell.
println!("{:?}, {:p}", smuggled, &smuggled[0]);
});
child.join().unwrap();
// This should print the same address as above indicating the underlying
// `Cell` in x is now shared across threads, a violation of `!Sync`.
println!("{:?}, {:p}", x, &x[0]);
});
}
This produces the output:
[Cell { value: 2 }], 0x7ffd1591f4cc
[Cell { value: 2 }], 0x7ffd1591f4cc
While this example is pretty benign, here's how it can lead to an arbitrary memory read from safe rust code:
#![forbid(unsafe_code)]
use crossbeam_utils::thread;
use std::cell::Cell;
use beef::Cow;
static SOME_INT: u64 = 123;
fn main() {
// A simple tagged union used to demonstrate the problems with data races
// in Cell. Cell is designed for single threads and has no synchronization
// methods. Thus if it is allowed to be used simaltaneously by two threads,
// it is possible to race its interior mutability methods to derference an
// arbitrary pointer.
#[derive(Debug, Clone, Copy)]
enum RefOrInt<'a> {
Ref(&'a u64),
Int(u64),
}
let cell_array = [Cell::new(RefOrInt::Ref(&SOME_INT))];
thread::scope(|s| {
let cow1: Cow<[Cell<RefOrInt>]> = Cow::borrowed(cell_array.as_ref());
let cow2: Cow<[Cell<RefOrInt>]> = cow1.clone();
let child = s.spawn(move |_| {
// We've now smuggled the cell from above into this thread.
let smuggled_cell = cow2.unwrap_borrowed();
loop {
// Repeatedly write Ref(&addr) and Int(0xdeadbeef) into the cell.
smuggled_cell[0].set(RefOrInt::Ref(&SOME_INT));
smuggled_cell[0].set(RefOrInt::Int(0xdeadbeef));
}
});
loop {
let main_thread_cell = (*cow1)[0].clone().into_inner();
if let RefOrInt::Ref(addr) = main_thread_cell {
// Hope that between the time we pattern match the object as a
// `Ref`, it gets written to by the other thread.
if addr as *const u64 == &SOME_INT as *const u64 {
continue;
}
// Due to the data race, obtaining Ref(0xdeadbeef) is possible
println!("Pointer is now: {:p}", addr);
println!("Dereferencing addr will now segfault: {}", *addr);
}
}
});
}
This example produces:
Borrowed mutably! - Main
Initial destructure: 42
Borrowed mutably! - Thread
About to dereference ptr again
Return Code: -11 (SIGSEGV)
because there is a semantic difference between &mut self as *mut Self
and Self::into_raw(self)
. The API Beef
uses should at the very least be forwards compatible with being implemented in terms of into_raw_parts
, which will also help safeguard against misuse. I would suggest functions isomorphic to into_raw(owned: Self::Owned) -> (NonNull<Self>, Option<NonZeroUsize>)
and from_raw(ptr: NonNull<Self>, cap: NonZeroUsize) -> Self::Owned
.
To be fair, this is probably a lot more benign than the Rc case which used &T as *const T
, but this is still problematic. IIUC, the actual UB under Stacked Borrows would come when dropping the owned value, as your pointer no longer actually owns the place it points to, and merely borrows it. Except... mem::forget
counts as a use of the pointer per SB, I think. cc @RalfJung if you don't mind?
Roughly, and abusing the SB notation, here's the operations I think go on:
[Uniq]
&mut
ref taken to call owned_ptr
. Borrow stack: [Uniq, Mut]
&mut ref
is converted into a raw red. Borrow stack: [Uniq, Mut, Raw]
mem::forget
. This results in a fork, because I don't know the exact retagging semantics here.
[Uniq]
, and any further use of the derived raw pointer is UB as it is no longer on the borrow stack. (I think this is the actual operational case (as opposed to abstract machine case) iff the owned pointer is marked noalias
for LLVM. This is why I'm abusing the borrow stack notation to distinguish the owned pointer from raw pointers.)[Uniq, Mut, Raw]
rebuild
. Borrow stack: [Uniq, Mut, Raw, Uniq]
Uniq
are never used.So I've talked myself from "(probably)" to "(potentially)", but I'd still advise changing the API surface forwards compatible with the definitely-not unsound into_raw_parts
. At the very least, it avoids a misuse vector for the API.
With apologies to the Stacked Borrow team for my abuse of their notation above.
Messing around between the fat pointer and thin pointer is quite a lot of work, and there is quite some deep coupling between generic::Cow
, the Beef
trait and the Capacity
. Capacity
is also a poor name for what the trait is doing now.
A reasonable logical split would be:
generic::Cow
implements the API surface, methods like borrow
, owned
and into_owned
, as well as all the traits one expects.Beef
converts borrowed and owned values into parts (pointer, len, cap) and back.Capacity
to something like Container
, have it store (pointer, len, cap) from Beef
internally so that the particulars of how capacity is handled doesn't leak out to other abstractions.In a few places I'm using code like
match cow of {
Cow::Owned(o) -> // do something with the owned thingy
Cow::Borrowed(b) -> // do something with the borrowed thingy (keeping it's lifetime)
}
I can't find a way to do that in beef
Is there a chance to get a function along the lines of:
fn a_good_name_for_this(self) -> Option<&'a T> {
...
}
The following traits are missing:
PartialOrd
Ord
for beef::generic::Cow
.
Should the following function signature not use a &'static lifetime? You can't allocate in const so all str must be static. This might even improve performance.
pub const fn const_str(val: &'a str) -> Self
I have some code at the moment where I can't swap from std::borrow::Cow, because the inferring lifetime parameter can't see that const_str provides a Cow<&'static>.
//use beef::lean::Cow; -- compile error
use std::borrow::Cow;
struct Data<'a> {
data: String,
follower: Option<Cow<'a, str>>,
}
fn main() {
let mut data = Data {
data: "hello".into(),
follower: None,
};
data.follower = Cow::from(&data.data).into();
}
In this code, follower has lifetime 'data
and data
has lifetime 'data
If i use beef::Cow
โ compiler tells me:
error[E0597]: `data.data` does not live long enough
--> src\main.rs:14:31
|
14 | data.follower = Cow::from(data.data.as_str()).into();
| ^^^^^^^^^^^^^^^^^^ borrowed value does not live long enough
15 | }
| -
| |
| `data.data` dropped here while still borrowed
| borrow might be used here, when `data` is dropped and runs the destructor for type `Data<'_>`
One test fails when building on i686 or armv7hl
---- src/lib.rs - (line 26) stdout ----
Test executable failed (exit code 101).
stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `12`,
right: `8`', src/lib.rs:10:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
src/lib.rs - (line 26)
You can see the full logs at https://koji.fedoraproject.org/koji/taskinfo?taskID=70819940 and https://koji.fedoraproject.org/koji/taskinfo?taskID=70819939 (look at build.log).
std Cow
implements the Default
trait, which basically wraps the default version of the borrowed variant. It would be nice to have it also implemented for beef::Cow
, to allow using it in the contexts where this trait is required.
In the case of Beef::borrowed
can be implemented const Deref
. Will it be done?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.