fitzgen / bumpalo Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 111.0 717 KB

A fast bump allocation arena for Rust

Home Page: https://docs.rs/bumpalo

License: Apache License 2.0

Rust 100.00%

bumpalo's People

Contributors

Stargazers

Watchers

Forkers

matklad waywardmonkeys alexcrichton isgasho bnjbvr derekdreery sceneq crazyfork rtfeldman jaynus andylokandy zhiburt erichdongubler 95th caelunshun hansihe wodann novacrazy immunant riey ryoqun amanieu zakarumych ethanhs dzmitryfil phelerox rojer-98 vorner cuviper beling tamschi rasouli baloo yutiansut sashka icodein wada314 kerollmops justanotherdot glommer quininer dtolnay-contrib manythefish eliageretto denis2glez untitaker zenithlawyw konsumlamm dreamsmasher plugwash sirmixalotmixalot main-- yanchith maan2003 saethlin atsmtat dkaste overdrivenpotato wyhhh peamaeq stepancheg deeptir18 eliasin djankauskas zetanumbers evanrichter hominee hiroki-chen chaossir mattfbacon muhtasimtanmoy morristai sabrinajewson samkenxstream itsfarseen eftychis keithamus frisoft aumetra listary martijneken marmeladema bea4dev dmgolembiowski mgerhold sxyhack jamestiotio cynecx astrolemonade overlookmotel zslayton lambdaclass celeritascelery heezay jaredh159 shabbirhasan1 serega-fighter mcroomp bluurryy musicinmybrain

bumpalo's Issues

alloc_layout_slow and guarantees for iter_allocated_chunks

I agree that the fix made in #45 is the right one, however it has a problem: It can introduce padding bytes in the output of iter_allocated_chunks for large alignments.

I think it is fine to introduce padding for large alignments, but we need to decide where to draw the line between support and not supported -- and we need to improve the documentation to reflect this.

I do not have any strong opinions on how large an alignment we should support. I guess it would be nice to support up to alignment 16, since that is largest alignment of that real world code can sometimes need.

However I do not think that anybody who has 16-byte alignment requirements would also want to use iter_allocated_chunks.

Here is some code I used to test this:

const SUPPORTED_ALIGNMENTS: &[usize] = &[1, 2, 4, 8];

quickcheck::quickcheck! {
    fn test_alignment_chunks(sizes: Vec<usize>) -> () {
        for &alignment in SUPPORTED_ALIGNMENTS {
            let mut b = Bump::new();
            let mut sizes = sizes.iter().map(|&size| (size % 10) * alignment).collect::<Vec<_>>();

            for &size in &sizes {
                let layout = std::alloc::Layout::from_size_align(size, alignment).unwrap();
                let ptr = b.alloc_layout(layout).as_ptr() as *const u8 as usize;
                assert_eq!(ptr % alignment, 0);
            }

            for chunk in b.iter_allocated_chunks() {
                let mut remaining = chunk.len();
                while remaining > 0 {
                    let size = sizes.pop().expect("too many bytes in the chunk output");
                    assert!(remaining >= size, "returned chunk contained padding");
                    remaining -= size;
                }
            }
            assert_eq!(sizes.into_iter().sum::<usize>(), 0);
        }
    }
}

Scoped threads & rayon support

Hello

I was pondering about how to make bumpalo work with multiple threads. Let's say I want to allocate bunch of things, but do it in parallel and then keep the results around for further processing. Single threaded works fine, like this:

let bump = Bump::new();
let stuff: Vec<_> = (0..1_000_000).into_iter().map(|i| bump.alloc(i)).collect();
for i in &stuff {
    println!("{}", **i);
}

But this won't work, because Bump is not Sync (quite understandably):

let bump = Bump::new();
let stuff: Vec<_> = (0..1_000_000).into_par_iter().map(|i| bump.alloc(i)).collect();
for i in &stuff {
    println!("{}", **i);
}

The map_init won't help either. While now each thread gets its own Bump to play with, the Bumps are dropped just after the iteration is done so it can't be used afterwards.

I've done some experiments (some of them failed because bumpalo is so damn fast that even a single fetch_add is an order of magnitude slower) and finally came up with a PoC that works ‒ I wrap a pool of Bumps and borrow them out. When the borrow ends, the allocator is returned back into the pool. Therefore the borrow can allocate with the lifetime of the pool, so the code would be something like this:

let herd = Herd::new();
let stuff: Vec<_> = (0..1_000_000)
    .into_par_iter()
    .map_init(|| herd.get(), |bump, i| bump.alloc(i))
    .collect();
for i in &stuff {
    println!("{}", **i);
}

The PoC (it has no documentation, no tests and only the alloc method so far and I need to think hard and prove if what I do is actually sound) is here: https://github.com/vorner/bumpalo-herd

I want to write a blog post about it. But aside from that, I'd like to know: should I keep this as a separate crate and publish it myself, or would you like it added to bumpalo itself? I can send a PR with the implementation, once I polish it.

Layout::array has been stabilized

The function Layout::array has been stabilized.

This means that the TODO in lib.rs can finally be fixed. The fix would mean that we will no longer have to do defensive coding in layout_for_array. I think this might also be a small reduction in the generated code size, but I haven't actually tested it.

However given that 1.44 has just been stabilized, now is probably not the time to begin depending on it. Do you think we should perhaps create a version policy for bumpalo?

Provide feature to use nightly allocator API

In latest Rust nightly we can allocate with custom allocator (for example https://doc.rust-lang.org/nightly/alloc/vec/struct.Vec.html#method.new_in).

It will be great for bumpalo to provide the ability to use this. To achieve this, the &Bump needs to implement std::alloc::AllocRef.

Idea: ToBumpString

I find I end up writing

let x = bumpalo::format!(in cx.bump, "{}", var).into_bump_str();

a lot, so my suggestion is to add

trait ToBumpString: Display {
    fn to_bump_str<'a>(&self, arena: &'a Bump) -> &'a str {
        bumpalo::format!(in arena, "{}", var).into_bump_str();
    }

matching ToString (except that we want a &str).

A reset function that does not deallocate

I was wondering if it might be possible to write a variant of the reset function that does not deallocate memory.

Instead it would go through all the chunks and set chunk.ptr to chunk.data. Afterwards it would set bump.current_chunk_footer to bump.all_chunk_footers.

For this to make sense, we would have to modify either alloc_layout_fast or alloc_layout_slow, to check for additional chunks before allocating a new one.

We could for instance insert code similar to this in the beginning of alloc_layout_slow:

while let Some(next) = self.current_chunk_footer.get().as_ref().next.get() {
    self.current_chunk_footer.set(next);
    if let Some(ptr) = self.try_alloc_layout_fast(layout) {
        return ptr;
    }
}

Alternatively we could do a similar loop directly inside try_alloc_layout_fast.

What do you think? Would you accept a PR implementing this?

For context: I am looking into the possibility of implementing the flatbuffers crate on top of bumpalo. One of the core features of the library is the fact that once it has warmed up, it will not need to allocate additional memory to serialize further messages -- and this feature is not currently available in bumpalo.

LICENSE-MIT file says "Copyright (c) 2014 Alex Crichton"

Noticed this while checking the license before running mach vendor.

Implement `Sync` for `Vec`.

It seems that for T: Sync Vec<'bump, T> should implement Sync as well.

Checklist of feature requests for flatbuffers

With #37 implemented, I think bumpalo is much closer to being a possible backend for flatbuffers. However there are still a few features that would be needed, some of which might require guarantees that bumpalo is unable to provide.

~~I am hoping we can use this issue to discuss whether this sounds realistic or not. To use bumpalo, we would need:~~

Knowledge of the total number of bytes allocated so far. Either through an allocated_bytes() method on the Bump, or through a method that returns both a pointer and the actual number of bytes allocated.
Control over the padding bytes to avoid UB and leaking previous bytes in the buffer. Perhaps a method that guarantees that the padding has been zeroed? Perhaps if we had a method fn alloc_padding(&self, align: usize) -> &mut [u8] which gives a slice with 0 <= len < align?
Guarantees about allocation strategy. With #37, the allocation strategy was changed. Would it be possible to guarantee that it will not change again? Perhaps a guarantee that allocations will always happen from the back if a specific feature flag is set?
Bonus: Reverse iterator. Currently the allocated chunks are enumerated oldest-to-newest, but we would like to have a method for doing newest-to-oldest. (Though collecting them and reversing is not too much trouble either).

~~What do you think? Does it make sense to continue pursuing this?~~

See later post.

Bad debug_assertion in try_alloc_layout_fast

The following code will panic inside try_alloc_layout_fast on line 612 when debug_assertions are turned on:

fn main() {
    let layout = std::alloc::Layout::from_size_align(1, 0x1000).unwrap();
    bumpalo::Bump::new().alloc_layout(layout);
}

Use `MaybeUninit` in the return type for `[try_]alloc_layout`

MaybeUninit wasn't stable when these APIs were added. Now it is.

Note that this is a breaking change and will require a version bump.

About the `io::Write` trait impl on the `bumpalo::Vec`

Was just wondering if there was some limitation about implementing the io::Write trait on the Vec fork of the bumpalo crate? Is there any hidden complexity somewhere that I am not aware of?

Thank you very much for this crate, it's pretty useful and well-documented.

allocator_api: missing grow_zeroed implementation

I believe the implementation made in #92 could use a specific grow_zeroed implementation

If I read https://github.com/rust-lang/rust/blob/ee88f46bb5e27c4d9f30326e69ff2298dcbeecb1/library/core/src/alloc/mod.rs#L253-L277 correctly, that could use more effective one.
As currently implemented, it grows by new_layout.size() where it could grow by only new_layout.size()-old_layout.size().

Set up testing with miri

Copying the miri CI script from the typed-arena crate:

miri CI script

set -ex

MIRI_NIGHTLY=nightly-$(curl -s https://rust-lang.github.io/rustup-components-history/x86_64-unknown-linux-gnu/miri)
echo "Installing latest nightly with Miri: $MIRI_NIGHTLY"
rustup default "$MIRI_NIGHTLY"

cargo clean
rustup component add miri
cargo miri setup

cargo miri test

Still waiting on the full test suite to finish running, this is taking quite a bit longer than I'd anticipated.

However, the realloc tests are definitely failing, and I'm not sure I totally understand what miri is complaining about here. cc @RalfJung

realloc failure logs

error: Miri evaluation error: a raw memory access tried to access part of a pointer value as raw bytes
    --> /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/intrinsics.rs:1549:5
     |
1549 |     copy(src, dst, count)
     |     ^^^^^^^^^^^^^^^^^^^^^ Miri evaluation error: a raw memory access tried to access part of a pointer value as raw bytes
     |
note: inside call to `core::intrinsics::copy::<u8>` at src/lib.rs:1070:17
    --> src/lib.rs:1070:17
     |
1070 |                 ptr::copy(ptr.as_ptr(), p.as_ptr(), new_size);
     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside call to `<&Bump as alloc::Alloc>::realloc` at src/lib.rs:1118:21
    --> src/lib.rs:1118:21
     |
1118 |             let q = (&b).realloc(p, layout, 11).unwrap();
     |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside call to `tests::test_realloc` at src/lib.rs:1094:5
    --> src/lib.rs:1094:5
     |
1094 | /     fn test_realloc() {
1095 | |         use crate::alloc::Alloc;
1096 | |
1097 | |         unsafe {
...    |
1138 | |         }
1139 | |     }
     | |_____^
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
     = note: inside call to `<[closure@src/lib.rs:1094:5: 1139:6] as core::ops::FnOnce<()>>::call_once - shim` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
     = note: inside call to `<fn() as core::ops::FnOnce<()>>::call_once - shim(fn())` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:543:5
     = note: inside call to `tests::test::__rust_begin_short_backtrace::<fn()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:534:30
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232:5
     = note: inside call to `<[closure@DefId(5:635 ~ test[cf55]::run_test[0]::{{closure}}[3]) 0:fn()] as core::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1022:9
     = note: inside call to `<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send> as core::ops::FnOnce<()>>::call_once` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:318:9
     = note: inside call to `<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>> as core::ops::FnOnce<()>>::call_once` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
     = note: inside call to `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>, ()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:270:13
     = note: inside call to `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
     = note: inside call to `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<core_alloc::boxed::Box<dyn core::ops::FnOnce() + core::marker::Send>>, ()>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:567:18
     = note: inside call to `tests::test::run_test_in_process` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:474:21
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:495:13
     = note: inside call to `tests::test::run_test::run_test_inner` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:531:28
     = note: inside call to `tests::test::run_test` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:300:13
     = note: inside call to `tests::test::run_tests::<[closure@DefId(5:427 ~ test[cf55]::console[0]::run_tests_console[0]::{{closure}}[2]) 0:&mut tests::test::console::ConsoleTestState, 1:&mut core_alloc::boxed::Box<dyn tests::test::formatters::OutputFormatter>]>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/console.rs:295:5
     = note: inside call to `tests::test::run_tests_console` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:121:15
     = note: inside call to `tests::test::test_main` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libtest/lib.rs:140:5
     = note: inside call to `tests::test::test_main_static`
     = note: inside call to `main` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:34
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:73
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:136:5
     = note: inside call to `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(6:6014 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:13
     = note: inside call to closure at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:292:40
     = note: inside call to `std::panicking::r#try::do_call::<[closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:270:13
     = note: inside call to `std::panicking::r#try::<i32, [closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe]>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
     = note: inside call to `std::panic::catch_unwind::<[closure@DefId(6:6013 ~ std[848d]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn core::ops::Fn() -> i32 + core::marker::Sync + std::panic::RefUnwindSafe], i32>` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:51:25
     = note: inside call to `std::rt::lang_start_internal` at /home/fitzgen/.rustup/toolchains/nightly-2019-12-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:5
     = note: inside call to `std::rt::lang_start::<()>`

error: aborting due to previous error

test tests::test_realloc ... 
error: could not compile `bumpalo`.

To learn more, run the command again with --verbose.

allow reset() to reuse all chunks

comment from reset():
"/// If this arena has allocated multiple chunks to bump allocate into, then
/// the excess chunks are returned to the global allocator."

It would be good to have an option to reuse all of the chunks instead of returning them to the global allocator. Perhaps the assumption of the code is that allocating chunks is cheap. This is not necessarily true for large allocations - in that case the cost is more proportional to the size of the allocation, and the number of alloc() calls becomes less relevant. e.g. 1 alloc of 1GB costs about the same as 1024 allocs of 1MB. Most of the work is in the soft page faults.

Issues not related to technology

Is the image in readme a logo, if so it is interesting. At first glance I'd prefer to call it a "bullfight". Hahahahaha, I just thought the project was interesting so I wanted to say something about it.

Add try_reserve/_exact methods for Vec?

I tried to add these try_* methods (since I need fallible allocation) to Vec by wrapping the RawVec methods of the same names. However, they still seem to be infallible due to the alloc impl always returning Ok? https://github.com/fitzgen/bumpalo/blob/master/src/lib.rs#L1076

Would it be possible to make them fallible?

Why std::borrow::Borrow is not implemented for the collection types?

Hello, while I'm using the Vecs and Strings of bumpalo, I realized that std::borrow::Borrow trait is not implemented for these types even though std::convert::AsRef is implemented.

I guessed the following state in the Borrow's document might be related, but as far as I checked the code the bumpalo's collections fulfill the condition:

In particular Eq, Ord and Hash must be equivalent for borrowed and owned values: x.borrow() == y.borrow() should give the same result as x == y.

Does anyone know the reason why it is not implemented?
thanks,

Collections compatibility with Serde

Would it be feasible to have the bumpalo Vec type work with serde? That way we can handle deserializing variable length sequences of references to borrowed data. That would be a big improvement over using the global allocator.

Here are three example types. The first is not Deserialize with serde, because it requires borrowing a slice of references to strings, which is a variable-length data structure. The second is Deserialize with serde, but it causes expensive heap allocations. The third could be possible using bumpalo.

#[derive(Serialize, Deserialize)]
struct NotSerde<'a> {
    #[serde(borrow)]
    items: &'a [&'a str],
}

struct YesSerdeButRequiresHeapAllocation<'a> {
    #[serde(borrow)]
    items: Vec<&'a str>,
}

struct YesSerdeWithBumpalo<'a> {
    #[serde(borrow)]
    items: bumpalo::collections::Vec<'a, &'a str>,
}

Allocate OsStrs and Paths in a Bump?

It would be very convenient to support allocating OsStr and Path instances in a Bump.

Unfortunately the underlying byte representation isn't exposed on Windows, so we'll probably have to return Cow<'bump, OsStr> or Cow<'bump, Path> which may defeat the purpose :(

Consider using the `allocator_api` crate

Instead of forking from libstd directly ourselves.

This would involve merging String upstream into that crate plus any other collections we might end up needing.

Add method to return currently allocated bytes size?

I'm working with an application that prints diagnostic messages after running an operation to show how much memory was used during that operation, then proceeds on with the next operation. However, because the next operation relies on memory allocated in the previous operation, there are still active immutable borrows from the Bump allocator at the point where diagnostic information should be logged. Therefore, I cannot simply use arena.iter_allocated_chunks().map(|chunk| chunk.len()).sum() to return the allocated size, because that requires a mutable borrow.

Would you be willing to add a method for querying such information from an immutable Bump instance? Something like Bump::allocated_bytes(&self) -> usize

Allow creation of allocator from slice

The current new methods use Global allocator for the allocator to work with. Provide an unsafe way to create a bump allocator which will allow to create the allocator from a slice denoting unused memory.

Copy from old bump into new bump?

Is there a fast way to copy objects from one bump into another bump, beyond just cloning and allocating?

Or, would it be better to just keep a bump around for each set of objects (ie for caching)?

Add support for sub-arenas that can be reset in a LIFO order, without resetting the whole bump arena

We should make this work out so that safety is guaranteed by the types. So a sub-arena should exclusively borrow its parent (whether that is the top-level arena or another sub-arena). And the sub-arena shouldn't expose its parent's lifetime, so that references allocated through the sub-arena cannot be used after we drop the sub-arena, and therefore it is safe for the sub-arena's Drop impl to reset the bump pointer to the point it was at when the sub-arena was constructed.

Unclear how to add support for backing bumpalo::collections::Vec et al with a sub-arena and even whether this is desirable vs just waiting for custom allocators to stabilize, now that there is movement there.

cc @cfallin

Add an `into_bump_slice_mut` method for Vec?

If I understand correctly, two Vecs from the same Bump have separate allocations, correct? It seems like you should be able to consume them into distinct mutable slices, just like you can do currently do immutably with into_bump_slice.

Iterator of allocated chunks

I am considering writing a function unsafe fn iter_allocated_chunks(&mut self) -> ChunkIterator<'a> and then re-implementing each_allocated_chunk on top of this function.

Would you be interested in accepting a PR that does this?

double size of chunk each time we hit the global allocator for a new chunk

Within reason. Should probably have a max chunk size.

Bumpalo's Box does not seem to work with dyn Future

To reduce memory allocations, I'm trying to use bumpalo to create my own Boxed futures type.

I'm basing this off of an idiom present in the futures crate:

type BoxFuture<'a, T> = Pin<Box<dyn Future<Output = T> + 'a + Send>>;

https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html

When I try this with bumpalo::boxed::Box, however, it seems that I can't convert an impl Future into a dyn Future the same way that I can with std::boxed::Box.

Here is some example code demonstrating the issue (I wish the playground had bumpalo!):

use std::future::Future;
use std::pin::Pin;

/// This compiles:
fn f(
    fut: impl Future<Output = ()> + Send + 'static,
) -> Pin<Box<dyn Future<Output = ()> + Send + 'static>> {
    Box::pin(fut)
}

// This does not compile:
// fn g<'a>(
//     b: &'a bumpalo::Bump,
//     fut: impl Future<Output = ()> + Send + 'static,
// ) -> Pin<bumpalo::boxed::Box<'a, dyn Future<Output = ()> + Send + 'static>> {
//     bumpalo::boxed::Box::pin_in(fut, &b)
// }

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=80eeec8261dc28169cfc968525ed7e43

Is there something I'm missing with how to use this library? Thank you!

Provide a method to allocate a slice of zeroed/default values

Currently to bump-allocate any slice an existing slice is needed which is not always convenient. I guess currently the workaround is to use the alloc_layout method, but a method that doesn't require unsafe would be handy. Another handy addition would be a method that allocates and fills a slice with copies/clones of a single value.

bumpalo `vec!` doesn't accept trailing comma `,`

Not really an issue, but I thought I'd report it all the same, for the sake of symmetry with std::vec!.

Recipe for Rc?

Is there a recipe for creating an Rc with bumpalo? I'm trying to avoid the heap allocation(s) that happen inside of Rc.

(For reference, using the Rc::from_box function still appears to cause heap allocations.)

Thanks!

Flaw in `realloc` allows reading unknown memory

I'm very confused and don't know what is wrong

when I run this code in unit test with cargo test it failed with SIGSEGV

Here is my test code
https://github.com/Riey/bumpalo/tree/unit-test-failed

but when I run same code in main.rs then it success

Are references still stable if the Bump is moved?

Asking this because I'd like to store references into the bump along with the bump itself via OwningRef (or by raw pointer directly).

let bumps = Vec<OwningRef<Bump, T>>;
// or, more unsafely but more efficient
let bumps = Vec<(Bump, Vec<*mut T>)>;

If bumps grows then it will need to be resized. Will the references into the bump remain stable, meaning it's still okay to cast *mut T back to &mut T?

If not, how would you go about stably storing the bump alongside objects it has allocated?

I know Dodrio does this for CachedNodes, but I can't help but think that that particular implementation is unsound if the CachedNodes Fxhashmap needs to be resized.

Investigate (optionally) allocating initial chunks in a bump arena on the stack

We could use const generics to control the size, perhaps.

This means that the initial N bytes of allocation wouldn't need to hit the global allocator.

cc @cfallin

alloc_layout_slow does not respect layout alignment

I wanted to add this test in #42, but it failed:

#[test]
fn test_alignment() {
    let b = Bump::new();

    let layout = std::alloc::Layout::from_size_align(64, 64).unwrap();

    for _ in 0..1024 {
        let ptr = b.alloc_layout(layout).as_ptr();
        assert_eq!(ptr as *const u8 as usize % 64, 0);
    }
}

std::vec::Vec vs bumpalo::collections::vec::Vec

I would be grateful if you could explain the difference between std::vec::Vec and bumpalo::collections::vec::Vec. Does bumpalo::collections::vec::Vec exist just to support no_std?

I suppose I am confused as to how the following lines would differ in terms of allocating a vector within a bump arena.

let bump = Bump::new();
bump.alloc(std::vec::Vec::new())

let bump = Bump::new();
bumpalo::colections::vec::Vec::new_in(&bump)

Allocating and iterating forward?

Hi,

Is there any reason Bumpalo allocates from high to low addresses? Or could this be some policy on which bumpalo is parameterized?

I would like to use something like bumpalo as an abstraction for a zero-copy message-encoding allocator. I have a proof-of-concept using Vec<Box<[u64]>> today, with similar mut borrows for allocations (but much less intelligent resizing-in-place / new chunk sizing logic). If I can ditch my unsafe code for something with much smarter minds behind it, I’d really prefer that!

I think the only real change needed would be to adjust the internal ptr cursor initialization and replace some checked_sub with checked_add. But I don’t want to fork the crate just to do that.

Thoughts? I looked for closed issues and didn’t see the question raised earlier. Thanks!

Edit: I found and am starting to read https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html , but have not finished yet.

Edit2: The post's additional branch for bumping up only makes sense if you allow huge, non-sensical alignments in tandem with backing Chunks arbitrarily close to usize::MAX as *mut _. If you limit alignments to reasonable values (like 64, which is the most you need for AVX512 or maybe at most, 4kB), and ensure your allocated chunks end <= usize::MAX - MAX_ALIGN, you can skip the extra condition on each individual allocation.

(My requirements happen to be even less general than that -- all allocations have equivalent, u64 alignment, and we could monomorphize on that and the alignment branches go away anyway. But I understand that isn't generally true for Bumpalo consumers. A version of Bumpalo that monomorphized on some T might be generally useful?) Thanks!

Edit3: I think you could also just use a different alignment algorithm bumping up in the general case and avoid the extra branch. Here is the original algorithm described on the blog post:

    pub unsafe fn alloc(
        &mut self,
        size: usize,
        align: usize,
    ) -> *mut u8 {
        debug_assert!(align > 0);
        debug_assert!(align.is_power_of_two());

        let ptr = self.ptr as usize;

        // Round the bump pointer up to the requested
        // alignment.
        let aligned =
            try_null!(ptr.checked_add(align - 1)) & !(align - 1);

        let new_ptr = try_null!(aligned.checked_add(size));

        let end = self.end as usize;
        if new_ptr > end {
            // Didn't have enough capacity!
            return std::ptr::null_mut();
        }

        self.ptr = new_ptr as *mut u8;
        aligned as *mut u8
    }

Here is a similar algorithm that avoids the extra alignment check while bumping up:

    pub unsafe fn alloc2(
        &mut self,
        size: usize,
        align: usize,
    ) -> *mut u8 {
        debug_assert!(align > 0);
        debug_assert!(align.is_power_of_two());
        debug_assert!(size <= isize::MAX as _ && align <= isize::MAX as _);

        let ptr = self.ptr as usize;

        // Round the bump pointer down, to the requested
        // alignment.  Ptr is non-NULL, so we can subtract one 
        // without underflow.  The resulting aligned ptr is always
        // exactly 'align' below the actual ptr we will allocate.
        let aligned = (ptr - 1) & !(align - 1);
        // Size and align must be <= isize::MAX in Rust; their addition cannot overflow usize.
        let new_ptr = try_null!(aligned.checked_add(size + align));

        let end = self.end as usize;
        if new_ptr > end {
            // Didn't have enough capacity!
            return std::ptr::null_mut();
        }
        self.ptr = new_ptr as *mut u8;
        // If new_ptr didn't overflow, this cannot overflow either:
        (aligned + align) as *mut u8
    }

And indeed, Godbolt shows it has two branches instead of three, like bumping down. It is still a handful of instructions longer than bumping down (13 -> 17), but none of them are branches.

The essential insight is that (ptr + (A - 1)) & !(A - 1)) == ((ptr - 1) & !(A - 1)) + A due to modular arithmetic.

As far as gaming the microbenchmark, both bumping up and down might benefit from LLVM generating more in-order code for the non-error cases? But maybe it knows more about optimizing x86 branches than I do.

The logo is misleading

The current logo is a bit misleading, since it gives the initial impression that it bumpalo allocates forwards. I suggest that the logo is flipped (except for the numbers of course).

Could `Bump` be `Sync`?

Thanks for making a lovely library!

I don't know enough about the internals to know if Bump could safely implement Sync (the way it now does for Send) but I recently found myself wanting to share a single &Bump between threads, so that several threads could allocate into the same arena in parallel, and then once all the threads had been joined, there could be additional processing done on the arena before dropping it.

Is it conceivable that Bump could implement Sync as well as Send? Or is there maybe a technique I'm not thinking of where it's possible to achieve the same result (multiple threads bump-allocating in parallel, and then using that memory on the main thread once the threads have completed) without Bump implementing Sync?

Consider offering an API similar to copyless

The copyless crate allows one to allocate a large struct on the heap without first creating it on the stack and then copying it. One should be able to do this with bumpalo too.

If you try to run this code, it will cause a stack overflow exception (even though we want the data on the heap):

fn main() {
    let bump = bumpalo::Bump::new();
    bump.alloc([0; 1_000_000_000]);
}

The copyless crate solves this by using on the fact that llvm seems to be able to optimize x = FOO; memcpy(ptr, &x, sizeof(x)); into *ptr = FOO.

On the other hand, llvm does not seem to be able to optimize x = FOO; ptr = alloc(sizeof(x)); memcpy(ptr, &x, sizeof(x)); in the same way, and instead relies on the stack to store the value of FOO.

Anything to do before making a new breaking release?

We have had a bunch of great work committed to master that I'd like to publish, and some of it is breaking (the reversal of bump direction, for example). Because it is breaking, I'd like to double check that we aren't missing any breaking changes that we can coalesce into a single release, rather than eventually doing multiple breaking releases.

@TethysSvensson, anything else on your radar? Anything else needed for your flatbuffers effort?

cc @vorner

TODO

Fill out the CHANGELOG
Explain how to update for the breaking changes
other stuff?

Commits since the last release

1ade235 Merge pull request #50 from vorner/alloc_str
b633583 Bump::alloc_str
e61f3a4 Fix license copy-pasta
33f4bcf Merge pull request #44 from TethysSvensson/slice-allocations
f18e9f1 Add tests for alloc_slice_fill
b70b52a Change the argument order to make alloc_slice_fill_copy consistent with alloc_slice_fill_with
a3781df Fix ZSTs in alloc and dealloc instead of only in the slice functions
017926b De-duplicate the code and add a two new functions
0c7ed93 Update src/lib.rs
e01652f We probably should not remove the padding at the end of an array
14446bc Fix up the argument order in the docs as well
941b3b7 Fix up the argument order
2a3c79e Implement some more slice allocation functions
ca32821 Merge pull request #48 from TethysSvensson/issue-46-alt
abacc21 Fix the broken realloc test
9709ff8 Use compile time assertions instead of a const fn max
77ebbc1 Simplify some const definitions
d15154b Update the allocation strategy again
aad669a Even better strategy
e82ab18 Better doubling strategy
ab73145 Fix for #46
3d8dc44 Merge pull request #45 from fitzgen/alloc-layout-slow-alignment
60261bd Make sure to align pointers in alloc_layout_slow
7032a99 Merge pull request #42 from TethysSvensson/iter-order
1cc8071 Reverse the order of the chunks in iter_allocated_chunks and each_allocated_chunk
96841a6 Merge pull request #40 from fitzgen/remove-dbg
3e6ee5c Remove leftover dbg! from debugging
d261433 Merge pull request #39 from zhiburt/remove_commented_code_in_lib
36a3f8a Remove commented block in test block.
38054c7 Merge pull request #37 from fitzgen/bump-downwards
2f33d05 Bump downwards
f7b9e2e Update to criterion 0.3
c0fa76c Merge pull request #36 from andylokandy/extend_reserve
35291d6 Merge pull request #35 from TethysSvensson/better-realloc
0f5a5e2 Use a realloc algorithm closer to the previous one, since it was easier to reason about
f8eb1fb Reserve space before Vec::extend()
370d37a Improve the dealloc and realloc functions to more agressively reuse memory
7cd9452 Merge pull request #34 from TethysSvensson/always-no_std
43e66ae Always compile as a no_std crate
d62674c Merge pull request #33 from TethysSvensson/reset-save-last
0857024 Reimplement reset to keep the last chunk instead of the first
2cb090d Merge pull request #32 from TethysSvensson/clippy
2d70b69 Lint cleanup
9b22a64 Merge pull request #29 from TethysSvensson/with-capacity
71b5e88 Add test for with_capacity
cf19e20 Simplify the implementation a bit
92d2842 Add a with_capacity function
7ac4a53 cargo fmt
ee4fa63 Merge pull request #31 from fitzgen/fix-bad-debug-assertion
7a2c28b Check for some potential overflows and fix buggy assertion
1fec3b8 Merge pull request #30 from TethysSvensson/iterator
9f2d375 Re-introduce each_allocated_chunk for backwards compatibility
3b42fd8 Update src/lib.rs
68c1e4d Update src/lib.rs
c89d03d Update src/lib.rs
75bdf16 Update src/lib.rs
cfd01ae Update src/lib.rs
96ab7cc Replace each_allocated_chunk function with iter_allocated_chunks function
d823726 Merge pull request #24 from rtfeldman/patch-1
d5a3a10 Fix inaccurate documentation URL in Cargo.toml

Can't unsize bumpalo's Box

Hello

It seems the bumpalo's Box doesn't implement CoerceUnsize, like the usual Box does. I'm not sure how one should be able to create upcastable smart pointers, as that's a nightly-only API :-(.

This code does not compile (and I can't figure out any equivalent that does):

use std::fmt::Debug;

use bumpalo::Bump;
use bumpalo::boxed::Box as BumpBox;

#[derive(Debug)]
struct X;

fn get_unsized(arena: &Bump) -> BumpBox<dyn Debug> {
    BumpBox::new_in(X, arena)
}

This fails with mismatched types. Trying into() and similar doesn't help.

I've looked at this example around downcast and I'm pretty sure the example uses the ordinary std::boxed::Box, not the bumpalo's one.

Is there a way to do the upcast I'm not seeing? Should the example be modified to use that?

Possible unexpected panic from Bump::alloc?

Hello

I was interested in how the thing is working inside, so I read the code. I discovered this thing: https://github.com/fitzgen/bumpalo/blob/master/src/lib.rs#L539.

Now, if I understand it correctly, this can panic „unexpectedly“, even if the user didn't do anything outright wrong.

Let's say current chunk lives on really high address. For simplicity, let's say it simply lives in the highest page and the end has address 0xFFFFFFFE (so it's even, or something, let's say we are on 32bit).
The chunk is running out of allocated space, so the finger is also somewhere very close.
The user asks for allocation that has a sane size, but doesn't fit.

In this case, the address will overflow and panic will happen.

Would it make sense to not panic, but simply return None in that case, because the problem here is simply that the allocation doesn't fit into the current chunk and a new chunk needs to be created?

Provide an `alloc_extend` method

It would be great if Bump could provide an alloc_extend method to allocate values from an iterator and return a slice (similarly to TypedArena::alloc_extend). Something in the lines of:

pub fn alloc_extend<T, I>(&self, iterable: I) -> &mut [T] where
    T: BumpAllocSafe,
    I: IntoIterator<Item = T>,

Implement the necessary traits for collections

error[E0277]: a value of type `bumpalo::collections::Vec<'_, (&[u8], u64)>` cannot be built from an iterator over elements of type `(&[u8], u64)`
  --> main.rs:73:61
   |
73 |     let mut ordered: Vec<(&[u8], u64)> = counts.into_iter().collect();
   |                                                             ^^^^^^^ value of type `bumpalo::collections::Vec<'_, (&[u8], u64)>` cannot be built from `std::iter::Iterator<Item=(&[u8], u64)>`
   |
   = help: the trait `FromIterator<(&[u8], u64)>` is not implemented for `bumpalo::collections::Vec<'_, (&[u8], u64)>`

counts have type HashMap<&'bump [u8], u64>

Can we implement FromIter and other trait to support collect for feature parity with standard library?

No Send implementation?

Since Bump contains NonNull, it doesn't implement Send. This prevents it from being used in a thread-local store, for example.

It should be sound to implement the trait for Bump, since the only reason NonNull isn't Send is because it may be aliased, which Bump doesn't do.

impl PartialEq for bumpalo::Vec<T>

I moved some data into bumpalo arenas. I had to change the struct definitions from using std::vec::Vec to bumpalo::Vec, and as a result, I had to remove #[derive(PartialEq)] from those structs. Then I had to disable some tests that were using assert_eq!.

Suppose I implement these:

impl<'a, 'b, A, B> PartialEq<bumpalo::Vec<'b, B>> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }

impl<'a, 'b, A, B> PartialEq<&'b [b]> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }

impl<'a, 'b, A, B> PartialEq<&'b mut [b]> for bumpalo::Vec<'a, A>
where A: PartialEq<B> { ... }

Does that sound like a PR you'd take?

Large alignments cause heap corruption

Contrived example:

use bumpalo::Bump;
use std::alloc::Layout;

fn main() {
    let bump = Bump::new();
    let layout = Layout::from_size_align(1024, 32).unwrap();
    bump.alloc_layout(layout);
}

Blows up with:

error: process didn't exit successfully: `target\debug\crash.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)

This only seems to happen with alignments greater than 16, and only if the allocation is above a certain size (maybe big enough to overflow the current chunk?).

readme_up_to_date test not passed

Hi. One test not passed -- readme_up_to_date:

Checking that `cargo readme > README.md` is up to date...
thread 'cargo_readme_up_to_date' panicked at 'Run `cargo readme > README.md` to update README.md', tests/readme_up_to_date.rs:20:9

Using rust/cargo provided by distribution (Fedora).

fitzgen / bumpalo Goto Github PK

bumpalo's People

Contributors

Stargazers

Watchers

Forkers

bumpalo's Issues

TODO

Recommend Projects

Recommend Topics

Recommend Org