fitzgen / bumpalo Goto Github PK
View Code? Open in Web Editor NEWA fast bump allocation arena for Rust
Home Page: https://docs.rs/bumpalo
License: Apache License 2.0
A fast bump allocation arena for Rust
Home Page: https://docs.rs/bumpalo
License: Apache License 2.0
I agree that the fix made in #45 is the right one, however it has a problem: It can introduce padding bytes in the output of iter_allocated_chunks
for large alignments.
I think it is fine to introduce padding for large alignments, but we need to decide where to draw the line between support and not supported -- and we need to improve the documentation to reflect this.
I do not have any strong opinions on how large an alignment we should support. I guess it would be nice to support up to alignment 16, since that is largest alignment of that real world code can sometimes need.
However I do not think that anybody who has 16-byte alignment requirements would also want to use iter_allocated_chunks
.
Here is some code I used to test this:
const SUPPORTED_ALIGNMENTS: &[usize] = &[1, 2, 4, 8];
quickcheck::quickcheck! {
fn test_alignment_chunks(sizes: Vec<usize>) -> () {
for &alignment in SUPPORTED_ALIGNMENTS {
let mut b = Bump::new();
let mut sizes = sizes.iter().map(|&size| (size % 10) * alignment).collect::<Vec<_>>();
for &size in &sizes {
let layout = std::alloc::Layout::from_size_align(size, alignment).unwrap();
let ptr = b.alloc_layout(layout).as_ptr() as *const u8 as usize;
assert_eq!(ptr % alignment, 0);
}
for chunk in b.iter_allocated_chunks() {
let mut remaining = chunk.len();
while remaining > 0 {
let size = sizes.pop().expect("too many bytes in the chunk output");
assert!(remaining >= size, "returned chunk contained padding");
remaining -= size;
}
}
assert_eq!(sizes.into_iter().sum::<usize>(), 0);
}
}
}
Hello
I was pondering about how to make bumpalo work with multiple threads. Let's say I want to allocate bunch of things, but do it in parallel and then keep the results around for further processing. Single threaded works fine, like this:
let bump = Bump::new();
let stuff: Vec<_> = (0..1_000_000).into_iter().map(|i| bump.alloc(i)).collect();
for i in &stuff {
println!("{}", **i);
}
But this won't work, because Bump
is not Sync
(quite understandably):
let bump = Bump::new();
let stuff: Vec<_> = (0..1_000_000).into_par_iter().map(|i| bump.alloc(i)).collect();
for i in &stuff {
println!("{}", **i);
}
The map_init
won't help either. While now each thread gets its own Bump
to play with, the Bump
s are dropped just after the iteration is done so it can't be used afterwards.
I've done some experiments (some of them failed because bumpalo is so damn fast that even a single fetch_add
is an order of magnitude slower) and finally came up with a PoC that works ‒ I wrap a pool of Bumps and borrow them out. When the borrow ends, the allocator is returned back into the pool. Therefore the borrow can allocate with the lifetime of the pool, so the code would be something like this:
let herd = Herd::new();
let stuff: Vec<_> = (0..1_000_000)
.into_par_iter()
.map_init(|| herd.get(), |bump, i| bump.alloc(i))
.collect();
for i in &stuff {
println!("{}", **i);
}
The PoC (it has no documentation, no tests and only the alloc
method so far and I need to think hard and prove if what I do is actually sound) is here: https://github.com/vorner/bumpalo-herd
I want to write a blog post about it. But aside from that, I'd like to know: should I keep this as a separate crate and publish it myself, or would you like it added to bumpalo itself? I can send a PR with the implementation, once I polish it.
The function Layout::array
has been stabilized.
This means that the TODO in lib.rs
can finally be fixed. The fix would mean that we will no longer have to do defensive coding in layout_for_array
. I think this might also be a small reduction in the generated code size, but I haven't actually tested it.
However given that 1.44 has just been stabilized, now is probably not the time to begin depending on it. Do you think we should perhaps create a version policy for bumpalo?
In latest Rust nightly we can allocate with custom allocator (for example https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html#method.new_in).
It will be great for bumpalo to provide the ability to use this. To achieve this, the &Bump
needs to implement std::alloc::AllocRef
.
I find I end up writing
let x = bumpalo::format!(in cx.bump, "{}", var).into_bump_str();
a lot, so my suggestion is to add
trait ToBumpString: Display {
fn to_bump_str<'a>(&self, arena: &'a Bump) -> &'a str {
bumpalo::format!(in arena, "{}", var).into_bump_str();
}
matching ToString
(except that we want a &str
).
I was wondering if it might be possible to write a variant of the reset function that does not deallocate memory.
Instead it would go through all the chunks and set chunk.ptr
to chunk.data
. Afterwards it would set bump.current_chunk_footer
to bump.all_chunk_footers
.
For this to make sense, we would have to modify either alloc_layout_fast
or alloc_layout_slow
, to check for additional chunks before allocating a new one.
We could for instance insert code similar to this in the beginning of alloc_layout_slow
:
while let Some(next) = self.current_chunk_footer.get().as_ref().next.get() {
self.current_chunk_footer.set(next);
if let Some(ptr) = self.try_alloc_layout_fast(layout) {
return ptr;
}
}
Alternatively we could do a similar loop directly inside try_alloc_layout_fast
.
What do you think? Would you accept a PR implementing this?
For context: I am looking into the possibility of implementing the flatbuffers crate on top of bumpalo. One of the core features of the library is the fact that once it has warmed up, it will not need to allocate additional memory to serialize further messages -- and this feature is not currently available in bumpalo.
Noticed this while checking the license before running mach vendor
.
It seems that for T: Sync
Vec<'bump, T>
should implement Sync
as well.
With #37 implemented, I think bumpalo is much closer to being a possible backend for flatbuffers. However there are still a few features that would be needed, some of which might require guarantees that bumpalo is unable to provide.
I am hoping we can use this issue to discuss whether this sounds realistic or not. To use bumpalo, we would need:
allocated_bytes()
method on the Bump, or through a method that returns both a pointer and the actual number of bytes allocated.fn alloc_padding(&self, align: usize) -> &mut [u8]
which gives a slice with 0 <= len < align
?What do you think? Does it make sense to continue pursuing this?
See later post.
The following code will panic inside try_alloc_layout_fast
on line 612 when debug_assertions
are turned on:
fn main() {
let layout = std::alloc::Layout::from_size_align(1, 0x1000).unwrap();
bumpalo::Bump::new().alloc_layout(layout);
}
MaybeUninit
wasn't stable when these APIs were added. Now it is.
Note that this is a breaking change and will require a version bump.
Was just wondering if there was some limitation about implementing the io::Write
trait on the Vec
fork of the bumpalo crate? Is there any hidden complexity somewhere that I am not aware of?
Thank you very much for this crate, it's pretty useful and well-documented.
I believe the implementation made in #92 could use a specific grow_zeroed
implementation
If I read https://github.com/rust-lang/rust/blob/ee88f46bb5e27c4d9f30326e69ff2298dcbeecb1/library/core/src/alloc/mod.rs#L253-L277 correctly, that could use more effective one.
As currently implemented, it grows by new_layout.size()
where it could grow by only new_layout.size()-old_layout.size()
.
Copying the miri CI script from the typed-arena
crate:
set -ex
MIRI_NIGHTLY=nightly-$(curl -s https://rust-lang.github.io/rustup-components-history/x86_64-unknown-linux-gnu/miri)
echo "Installing latest nightly with Miri: $MIRI_NIGHTLY"
rustup default "$MIRI_NIGHTLY"
cargo clean
rustup component add miri
cargo miri setup
cargo miri test
Still waiting on the full test suite to finish running, this is taking quite a bit longer than I'd anticipated.
However, the realloc
tests are definitely failing, and I'm not sure I totally understand what miri is complaining about here. cc @RalfJung
error: Miri evaluation error: a raw memory access tried to access part of a pointer value as raw bytes
--> /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/intrinsics.rs:1549:5
|
1549 | copy(src, dst, count)
| ^^^^^^^^^^^^^^^^^^^^^ Miri evaluation error: a raw memory access tried to access part of a pointer value as raw bytes
|
note: inside call to `core::intrinsics::copy::<u8>` at src/lib.rs:1070:17
--> src/lib.rs:1070:17
|
1070 | ptr::copy(ptr.as_ptr(), p.as_ptr(), new_size);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside call to `<&Bump as alloc::Alloc>::realloc` at src/lib.rs:1118:21
--> src/lib.rs:1118:21
|
1118 | let q = (&b).realloc(p, layout, 11).unwrap();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside call to `tests::test_realloc` at src/lib.rs:1094:5
--> src/lib.rs:1094:5
|
1094 | / fn test_realloc() {
1095 | | use crate::alloc::Alloc;
1096 | |
1097 | | unsafe {
... |
1138 | | }
1139 | | }
| |_____^
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
= note: inside call to `<[closure@src/lib.rs:1094:5: 1139:6] as core::ops::FnOnce<()>>::call_once - shim` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
= note: inside call to `<fn() as core::ops::FnOnce<()>>::call_once - shim(fn())` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:543:5
= note: inside call to `tests::test::__rust_begin_short_backtrace::<fn()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:534:30
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
= note: inside call to `<[closure@DefId(5:635 ~ test[cf55]::run_test[0]::{{closure}}[3]) 0:fn()] as core::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1022:9
= note: inside call to `<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send> as core::ops::FnOnce<()>>::call_once` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:318:9
= note: inside call to `<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>> as core::ops::FnOnce<()>>::call_once` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
= note: inside call to `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>, ()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:270:13
= note: inside call to `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
= note: inside call to `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>, ()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:567:18
= note: inside call to `tests::test::run_test_in_process` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:474:21
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:495:13
= note: inside call to `tests::test::run_test::run_test_inner` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:531:28
= note: inside call to `tests::test::run_test` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:300:13
= note: inside call to `tests::test::run_tests::<[closure@DefId(5:427 ~ test[cf55]::console[0]::run_tests_console[0]::{{closure}}[2]) 0:&mut tests::test::console::ConsoleTestState, 1:&mut core_alloc::boxed::Box<dyn tests::test::formatters::OutputFormatter>]>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/console.rs:295:5
= note: inside call to `tests::test::run_tests_console` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:121:15
= note: inside call to `tests::test::test_main` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:140:5
= note: inside call to `tests::test::test_main_static`
= note: inside call to `main` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:34
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:73
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:136:5
= note: inside call to `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(6:6014 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:13
= note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
= note: inside call to `std::panicking::r#try::do_call::<[closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:270:13
= note: inside call to `std::panicking::r#try::<i32, [closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe]>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
= note: inside call to `std::panic::catch_unwind::<[closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:51:25
= note: inside call to `std::rt::lang_start_internal` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:5
= note: inside call to `std::rt::lang_start::<()>`
error: aborting due to previous error
test tests::test_realloc ...
error: could not compile `bumpalo`.
To learn more, run the command again with --verbose.
comment from reset():
"/// If this arena has allocated multiple chunks to bump allocate into, then
/// the excess chunks are returned to the global allocator."
It would be good to have an option to reuse all of the chunks instead of returning them to the global allocator. Perhaps the assumption of the code is that allocating chunks is cheap. This is not necessarily true for large allocations - in that case the cost is more proportional to the size of the allocation, and the number of alloc() calls becomes less relevant. e.g. 1 alloc of 1GB costs about the same as 1024 allocs of 1MB. Most of the work is in the soft page faults.
Is the image in readme a logo, if so it is interesting. At first glance I'd prefer to call it a "bullfight". Hahahahaha, I just thought the project was interesting so I wanted to say something about it.
I tried to add these try_* methods (since I need fallible allocation) to Vec
by wrapping the RawVec
methods of the same names. However, they still seem to be infallible due to the alloc impl always returning Ok
? https://github.com/fitzgen/bumpalo/blob/master/src/lib.rs#L1076
Would it be possible to make them fallible?
Hello, while I'm using the Vec
s and String
s of bumpalo, I realized that std::borrow::Borrow
trait is not implemented for these types even though std::convert::AsRef
is implemented.
I guessed the following state in the Borrow
's document might be related, but as far as I checked the code the bumpalo's collections fulfill the condition:
In particular
Eq
,Ord
andHash
must be equivalent for borrowed and owned values:x.borrow() == y.borrow()
should give the same result asx == y
.
Does anyone know the reason why it is not implemented?
thanks,
Would it be feasible to have the bumpalo Vec type work with serde? That way we can handle deserializing variable length sequences of references to borrowed data. That would be a big improvement over using the global allocator.
Here are three example types. The first is not Deserialize with serde, because it requires borrowing a slice of references to strings, which is a variable-length data structure. The second is Deserialize with serde, but it causes expensive heap allocations. The third could be possible using bumpalo.
#[derive(Serialize, Deserialize)]
struct NotSerde<'a> {
#[serde(borrow)]
items: &'a [&'a str],
}
struct YesSerdeButRequiresHeapAllocation<'a> {
#[serde(borrow)]
items: Vec<&'a str>,
}
struct YesSerdeWithBumpalo<'a> {
#[serde(borrow)]
items: bumpalo::collections::Vec<'a, &'a str>,
}
It would be very convenient to support allocating OsStr
and Path
instances in a Bump
.
Unfortunately the underlying byte representation isn't exposed on Windows, so we'll probably have to return Cow<'bump, OsStr>
or Cow<'bump, Path>
which may defeat the purpose :(
Instead of forking from libstd directly ourselves.
This would involve merging String
upstream into that crate plus any other collections we might end up needing.
I'm working with an application that prints diagnostic messages after running an operation to show how much memory was used during that operation, then proceeds on with the next operation. However, because the next operation relies on memory allocated in the previous operation, there are still active immutable borrows from the Bump
allocator at the point where diagnostic information should be logged. Therefore, I cannot simply use arena.iter_allocated_chunks().map(|chunk| chunk.len()).sum()
to return the allocated size, because that requires a mutable borrow.
Would you be willing to add a method for querying such information from an immutable Bump
instance? Something like Bump::allocated_bytes(&self) -> usize
The current new
methods use Global allocator for the allocator to work with. Provide an unsafe
way to create a bump allocator which will allow to create the allocator from a slice denoting unused memory.
Is there a fast way to copy objects from one bump into another bump, beyond just cloning and allocating?
Or, would it be better to just keep a bump around for each set of objects (ie for caching)?
We should make this work out so that safety is guaranteed by the types. So a sub-arena should exclusively borrow its parent (whether that is the top-level arena or another sub-arena). And the sub-arena shouldn't expose its parent's lifetime, so that references allocated through the sub-arena cannot be used after we drop the sub-arena, and therefore it is safe for the sub-arena's Drop
impl to reset the bump pointer to the point it was at when the sub-arena was constructed.
Unclear how to add support for backing bumpalo::collections::Vec
et al with a sub-arena and even whether this is desirable vs just waiting for custom allocators to stabilize, now that there is movement there.
cc @cfallin
If I understand correctly, two Vec
s from the same Bump
have separate allocations, correct? It seems like you should be able to consume them into distinct mutable slices, just like you can do currently do immutably with into_bump_slice
.
I am considering writing a function unsafe fn iter_allocated_chunks(&mut self) -> ChunkIterator<'a>
and then re-implementing each_allocated_chunk
on top of this function.
Would you be interested in accepting a PR that does this?
Within reason. Should probably have a max chunk size.
To reduce memory allocations, I'm trying to use bumpalo to create my own Boxed futures type.
I'm basing this off of an idiom present in the futures crate:
type BoxFuture<'a, T> = Pin<Box<dyn Future<Output = T> + 'a + Send>>;
https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html
When I try this with bumpalo::boxed::Box, however, it seems that I can't convert an impl Future
into a dyn Future
the same way that I can with std::boxed::Box.
Here is some example code demonstrating the issue (I wish the playground had bumpalo!):
use std::future::Future;
use std::pin::Pin;
/// This compiles:
fn f(
fut: impl Future<Output = ()> + Send + 'static,
) -> Pin<Box<dyn Future<Output = ()> + Send + 'static>> {
Box::pin(fut)
}
// This does not compile:
// fn g<'a>(
// b: &'a bumpalo::Bump,
// fut: impl Future<Output = ()> + Send + 'static,
// ) -> Pin<bumpalo::boxed::Box<'a, dyn Future<Output = ()> + Send + 'static>> {
// bumpalo::boxed::Box::pin_in(fut, &b)
// }
Is there something I'm missing with how to use this library? Thank you!
Currently to bump-allocate any slice an existing slice is needed which is not always convenient. I guess currently the workaround is to use the alloc_layout
method, but a method that doesn't require unsafe
would be handy. Another handy addition would be a method that allocates and fills a slice with copies/clones of a single value.
Not really an issue, but I thought I'd report it all the same, for the sake of symmetry with std::vec!
.
Is there a recipe for creating an Rc with bumpalo? I'm trying to avoid the heap allocation(s) that happen inside of Rc.
(For reference, using the Rc::from_box
function still appears to cause heap allocations.)
Thanks!
I'm very confused and don't know what is wrong
when I run this code in unit test with cargo test
it failed with SIGSEGV
Here is my test code
https://github.com/Riey/bumpalo/tree/unit-test-failed
but when I run same code in main.rs
then it success
Asking this because I'd like to store references into the bump along with the bump itself via OwningRef (or by raw pointer directly).
let bumps = Vec<OwningRef<Bump, T>>;
// or, more unsafely but more efficient
let bumps = Vec<(Bump, Vec<*mut T>)>;
If bumps grows then it will need to be resized. Will the references into the bump remain stable, meaning it's still okay to cast *mut T
back to &mut T
?
If not, how would you go about stably storing the bump alongside objects it has allocated?
I know Dodrio does this for CachedNodes
, but I can't help but think that that particular implementation is unsound if the CachedNodes Fxhashmap needs to be resized.
We could use const generics to control the size, perhaps.
This means that the initial N bytes of allocation wouldn't need to hit the global allocator.
cc @cfallin
I wanted to add this test in #42, but it failed:
#[test]
fn test_alignment() {
let b = Bump::new();
let layout = std::alloc::Layout::from_size_align(64, 64).unwrap();
for _ in 0..1024 {
let ptr = b.alloc_layout(layout).as_ptr();
assert_eq!(ptr as *const u8 as usize % 64, 0);
}
}
I would be grateful if you could explain the difference between std::vec::Vec
and bumpalo::collections::vec::Vec
. Does bumpalo::collections::vec::Vec
exist just to support no_std
?
I suppose I am confused as to how the following lines would differ in terms of allocating a vector within a bump arena.
let bump = Bump::new();
bump.alloc(std::vec::Vec::new())
let bump = Bump::new();
bumpalo::colections::vec::Vec::new_in(&bump)
Hi,
Is there any reason Bumpalo allocates from high to low addresses? Or could this be some policy on which bumpalo is parameterized?
I would like to use something like bumpalo as an abstraction for a zero-copy message-encoding allocator. I have a proof-of-concept using Vec<Box<[u64]>>
today, with similar mut borrows for allocations (but much less intelligent resizing-in-place / new chunk sizing logic). If I can ditch my unsafe code for something with much smarter minds behind it, I’d really prefer that!
I think the only real change needed would be to adjust the internal ptr cursor initialization and replace some checked_sub with checked_add. But I don’t want to fork the crate just to do that.
Thoughts? I looked for closed issues and didn’t see the question raised earlier. Thanks!
Edit: I found and am starting to read https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html , but have not finished yet.
Edit2: The post's additional branch for bumping up only makes sense if you allow huge, non-sensical alignments in tandem with backing Chunks arbitrarily close to usize::MAX as *mut _
. If you limit alignments to reasonable values (like 64, which is the most you need for AVX512 or maybe at most, 4kB), and ensure your allocated chunks end <= usize::MAX - MAX_ALIGN
, you can skip the extra condition on each individual allocation.
(My requirements happen to be even less general than that -- all allocations have equivalent, u64 alignment, and we could monomorphize on that and the alignment branches go away anyway. But I understand that isn't generally true for Bumpalo consumers. A version of Bumpalo that monomorphized on some T might be generally useful?) Thanks!
Edit3: I think you could also just use a different alignment algorithm bumping up in the general case and avoid the extra branch. Here is the original algorithm described on the blog post:
pub unsafe fn alloc(
&mut self,
size: usize,
align: usize,
) -> *mut u8 {
debug_assert!(align > 0);
debug_assert!(align.is_power_of_two());
let ptr = self.ptr as usize;
// Round the bump pointer up to the requested
// alignment.
let aligned =
try_null!(ptr.checked_add(align - 1)) & !(align - 1);
let new_ptr = try_null!(aligned.checked_add(size));
let end = self.end as usize;
if new_ptr > end {
// Didn't have enough capacity!
return std::ptr::null_mut();
}
self.ptr = new_ptr as *mut u8;
aligned as *mut u8
}
Here is a similar algorithm that avoids the extra alignment check while bumping up:
pub unsafe fn alloc2(
&mut self,
size: usize,
align: usize,
) -> *mut u8 {
debug_assert!(align > 0);
debug_assert!(align.is_power_of_two());
debug_assert!(size <= isize::MAX as _ && align <= isize::MAX as _);
let ptr = self.ptr as usize;
// Round the bump pointer down, to the requested
// alignment. Ptr is non-NULL, so we can subtract one
// without underflow. The resulting aligned ptr is always
// exactly 'align' below the actual ptr we will allocate.
let aligned = (ptr - 1) & !(align - 1);
// Size and align must be <= isize::MAX in Rust; their addition cannot overflow usize.
let new_ptr = try_null!(aligned.checked_add(size + align));
let end = self.end as usize;
if new_ptr > end {
// Didn't have enough capacity!
return std::ptr::null_mut();
}
self.ptr = new_ptr as *mut u8;
// If new_ptr didn't overflow, this cannot overflow either:
(aligned + align) as *mut u8
}
And indeed, Godbolt shows it has two branches instead of three, like bumping down. It is still a handful of instructions longer than bumping down (13 -> 17), but none of them are branches.
The essential insight is that (ptr + (A - 1)) & !(A - 1))
== ((ptr - 1) & !(A - 1)) + A
due to modular arithmetic.
As far as gaming the microbenchmark, both bumping up and down might benefit from LLVM generating more in-order code for the non-error cases? But maybe it knows more about optimizing x86 branches than I do.
The current logo is a bit misleading, since it gives the initial impression that it bumpalo allocates forwards. I suggest that the logo is flipped (except for the numbers of course).
Thanks for making a lovely library!
I don't know enough about the internals to know if Bump
could safely implement Sync
(the way it now does for Send
) but I recently found myself wanting to share a single &Bump
between threads, so that several threads could allocate into the same arena in parallel, and then once all the threads had been joined, there could be additional processing done on the arena before dropping it.
Is it conceivable that Bump
could implement Sync
as well as Send
? Or is there maybe a technique I'm not thinking of where it's possible to achieve the same result (multiple threads bump-allocating in parallel, and then using that memory on the main thread once the threads have completed) without Bump
implementing Sync
?
The copyless crate allows one to allocate a large struct on the heap without first creating it on the stack and then copying it. One should be able to do this with bumpalo too.
If you try to run this code, it will cause a stack overflow exception (even though we want the data on the heap):
fn main() {
let bump = bumpalo::Bump::new();
bump.alloc([0; 1_000_000_000]);
}
The copyless crate solves this by using on the fact that llvm seems to be able to optimize x = FOO; memcpy(ptr, &x, sizeof(x));
into *ptr = FOO
.
On the other hand, llvm does not seem to be able to optimize x = FOO; ptr = alloc(sizeof(x)); memcpy(ptr, &x, sizeof(x));
in the same way, and instead relies on the stack to store the value of FOO
.
We have had a bunch of great work committed to master that I'd like to publish, and some of it is breaking (the reversal of bump direction, for example). Because it is breaking, I'd like to double check that we aren't missing any breaking changes that we can coalesce into a single release, rather than eventually doing multiple breaking releases.
@TethysSvensson, anything else on your radar? Anything else needed for your flatbuffers effort?
cc @vorner
alloc_layout_slow
dbg!
from debuggingHello
It seems the bumpalo's Box doesn't implement CoerceUnsize, like the usual Box
does. I'm not sure how one should be able to create upcastable smart pointers, as that's a nightly-only API :-(.
This code does not compile (and I can't figure out any equivalent that does):
use std::fmt::Debug;
use bumpalo::Bump;
use bumpalo::boxed::Box as BumpBox;
#[derive(Debug)]
struct X;
fn get_unsized(arena: &Bump) -> BumpBox<dyn Debug> {
BumpBox::new_in(X, arena)
}
This fails with mismatched types
. Trying into()
and similar doesn't help.
I've looked at this example around downcast
and I'm pretty sure the example uses the ordinary std::boxed::Box
, not the bumpalo's one.
Is there a way to do the upcast I'm not seeing? Should the example be modified to use that?
Hello
I was interested in how the thing is working inside, so I read the code. I discovered this thing: https://github.com/fitzgen/bumpalo/blob/master/src/lib.rs#L539.
Now, if I understand it correctly, this can panic „unexpectedly“, even if the user didn't do anything outright wrong.
In this case, the address will overflow and panic will happen.
Would it make sense to not panic, but simply return None in that case, because the problem here is simply that the allocation doesn't fit into the current chunk and a new chunk needs to be created?
It would be great if Bump
could provide an alloc_extend
method to allocate values from an iterator and return a slice (similarly to TypedArena::alloc_extend). Something in the lines of:
pub fn alloc_extend<T, I>(&self, iterable: I) -> &mut [T] where
T: BumpAllocSafe,
I: IntoIterator<Item = T>,
error[E0277]: a value of type `bumpalo::collections::Vec<'_, (&[u8], u64)>` cannot be built from an iterator over elements of type `(&[u8], u64)`
--> main.rs:73:61
|
73 | let mut ordered: Vec<(&[u8], u64)> = counts.into_iter().collect();
| ^^^^^^^ value of type `bumpalo::collections::Vec<'_, (&[u8], u64)>` cannot be built from `std::iter::Iterator<Item=(&[u8], u64)>`
|
= help: the trait `FromIterator<(&[u8], u64)>` is not implemented for `bumpalo::collections::Vec<'_, (&[u8], u64)>`
counts
have type HashMap<&'bump [u8], u64>
Can we implement FromIter
and other trait to support collect
for feature parity with standard library?
Since Bump
contains NonNull
, it doesn't implement Send
. This prevents it from being used in a thread-local store, for example.
It should be sound to implement the trait for Bump
, since the only reason NonNull
isn't Send
is because it may be aliased, which Bump
doesn't do.
I moved some data into bumpalo arenas. I had to change the struct definitions from using std::vec::Vec
to bumpalo::Vec
, and as a result, I had to remove #[derive(PartialEq)]
from those structs. Then I had to disable some tests that were using assert_eq!
.
Suppose I implement these:
impl<'a, 'b, A, B> PartialEq<bumpalo::Vec<'b, B>> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }
impl<'a, 'b, A, B> PartialEq<&'b [b]> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }
impl<'a, 'b, A, B> PartialEq<&'b mut [b]> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }
Does that sound like a PR you'd take?
Contrived example:
use bumpalo::Bump;
use std::alloc::Layout;
fn main() {
let bump = Bump::new();
let layout = Layout::from_size_align(1024, 32).unwrap();
bump.alloc_layout(layout);
}
Blows up with:
error: process didn't exit successfully: `target\debug\crash.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)
This only seems to happen with alignments greater than 16, and only if the allocation is above a certain size (maybe big enough to overflow the current chunk?).
Hi. One test not passed -- readme_up_to_date
:
Checking that `cargo readme > README.md` is up to date...
thread 'cargo_readme_up_to_date' panicked at 'Run `cargo readme > README.md` to update README.md', tests/readme_up_to_date.rs:20:9
Using rust/cargo provided by distribution (Fedora).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.