Code Monkey home page Code Monkey logo

1brc's Introduction

One Billion Row Challenge

See here for the original description of this challenge.

I wrote a blog post explaining my ideas at https://curiouscoding.nl/posts/1brc .

Note that I do make some assumptions:

  • Lines in the input are at most 33 characters, including the newline.
  • City names are uniquely determined by their first and last 8 characters.
  • Each input line contains a city drawn uniform random from the set of available cities.

On my i7-10750H CPU running at 4.6GHz, this results in:

  • 5.72s wall time on a single thread.
  • 1.01s wall time on 12 threads on 6 cores.

While you're here

The justfile contains some often used commands that can be run using just.

For this project, I wrote a small shell just-shell to conveniently run just commands. It has super aggressive auto-completion, which makes running a small set of commands very convenient.

1brc's People

Contributors

ragnargrootkoerkamp avatar

Stargazers

Max Pushkarov avatar Jinyi avatar Jialong Liu avatar Sergey Kuznetsov avatar Stanislav Tkach avatar Oleksandr Anyshchenko avatar  avatar Samet Aylak avatar Vid Kavšek avatar Sergey Boytsov avatar Plintash avatar Kaizhao Zhang avatar Siva Shanmugam avatar  avatar Martin Mariano avatar  avatar Edouard SCHWEISGUTH avatar mtb avatar  avatar  avatar Levy Nunes avatar mg20400 avatar Aitozi avatar Wes McKinney avatar Shabbir Hasan avatar

Watchers

 avatar  avatar

1brc's Issues

Wrong result on test case, benchmark on 2950X

Hi, I tested your code with an original officially generated test case (which still follows all your extra assumptions), but it gives a lot of wrong average value (off-by-one error) and maybe some others. The input file and the result_ref.txt can be downloaded here: https://github.com/lehuyduc/1brc-simd

I tried 3 differents commit: pdep parsing, latest cleanup, and fix simd imports for latest nightly, but they all give wrong results. I also benchmark 2 of them.

Could you check what's wrong? Thanks!

Also, if you upload your measurements.txt file, I can test it on my PC for better comparison with your results.


Example of wrong values:
In pdep parsing and cleanup commit

Veracruz: 37.3/37.3/37.3
Abha: -37.5/17.9/69.9
Abidjan: -30.0/25.9/78.1
Abéché: -23.6/29.4/81.0
Accra: -23.1/26.3/75.5
...
Veracruz: -22.7/25.4/77.6 <--- this appear 2 times with 2 different results!!! Hash map error?

Reference

{Abha=-37.5/18.0/69.9, Abidjan=-30.0/26.0/78.1, Abéché=-23.6/29.4/81.0, Accra=-23.1/26.4/75.5, ... Veracruz=-22.7/25.4/77.6,

Run command I use is below.

cargo run --quiet --release -- measurements.txt --print > log.txt 2>&1
time ./target/release/one-billion-row-challenge measurements.txt --print

I set number of threads manually (then compile again each time):

let N_THREADS: usize = 6;

let records = run_parallel(
    data,
    &phf,
    num_slots,
    N_THREADS
);

Benchmark on 2950X, 2133MHz quad channel RAM, 3.65 GHz (32 thread) to 4.3 GHz (1 thread)

Commit pdep parsing 021bed3

32
total: 1.81s
real	0m1.965s
user	0m54.607s
sys	0m0.556s

16
total: 2.59s
real	0m2.745s
user	0m39.422s
sys	0m0.431s

12
total: 3.25s
real	0m3.412s
user	0m38.312s
sys	0m0.528s


8
total: 4.84s
real	0m4.994s
user	0m38.113s
sys	0m0.488s

6
total: 6.38s
real	0m6.537s
user	0m37.801s
sys	0m0.396s


4
total: 9.52s
real	0m9.678s
user	0m37.648s
sys	0m0.439s

1
total: 37.05s
real	0m37.211s
user	0m36.893s
sys	0m0.332s

Commit cleanup 01e4bc3

32
total: 589.26ms
real	0m0.744s
user	0m16.016s
sys	0m0.689s

16
total: 857.19ms
real	0m1.016s
user	0m12.893s
sys	0m0.512s

12
total: 1.11s
real	0m1.274s
user	0m12.684s
sys	0m0.480s

6
total: 2.12s
real	0m2.277s
user	0m12.399s
sys	0m0.417s


1
total: 12.19s
real	0m12.345s
user	0m12.004s
sys	0m0.365s

Issue in to

Using commit 5c5b809 and running with this command
RUST_BACKTRACE=1 RUSTFLAGS=-Zsanitizer=address cargo +nightly run -Zbuild-std --target x86_64-unknown-linux-gnu measurements-small.txt
The output is:

thread 'main' panicked at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:362:9:
slice::get_unchecked requires that the range is within the slice
stack backtrace:
   0: rust_begin_unwind
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_nounwind_fmt::runtime
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:110:18
   2: core::panicking::panic_nounwind_fmt
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panicking.rs:122:9
   3: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::get_unchecked
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:362:9
   4: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::get_unchecked
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/index.rs:430:18
   5: core::slice::<impl [T]>::get_unchecked
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/slice/mod.rs:666:20
   6: one_billion_row_challenge::to_key
             at ./src/main.rs:131:35
   7: one_billion_row_challenge::build_perfect_hash::{{closure}}
             at ./src/main.rs:306:19
   8: one_billion_row_challenge::flatten_callback
             at ./src/main.rs:368:5
   9: one_billion_row_challenge::build_perfect_hash::{{closure}}
             at ./src/main.rs:328:9
  10: one_billion_row_challenge::iter_lines
             at ./src/main.rs:215:9
  11: one_billion_row_challenge::build_perfect_hash
             at ./src/main.rs:327:5
  12: one_billion_row_challenge::main
             at ./src/main.rs:400:35
  13: core::ops::function::FnOnce::call_once
             at /home/andres/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread caused non-unwinding panic. aborting.

I think this fail when the name length is < 8

Can't compile the project on my macbook pro with +nightly feature

Hey there!

Sorry for bothering you with a dummy question, but I can't compile this project code because of some strange error (see in the attachments)

Could you please help me investigate it, most probably here some stupid mistake I'm making or something I forgot (I'm a newbie in Rust eco)

$ cargo +nightly build -r &> log.log

   Compiling one-billion-row-challenge v0.1.0 (/Users/almazmurzabekov/Desktop/PROJECTS/tmp/1brc)
error[E0599]: no method named `split_array_ref` found for reference `&[u8]` in the current scope
    --> src/main.rs:234:59
     |
234  |     let head: [u8; 8] = unsafe { *name.get_unchecked(..8).split_array_ref().0 };
     |                                                           ^^^^^^^^^^^^^^^
     |
help: there is a method `split_at` with a similar name, but with different arguments
    --> /Users/almazmurzabekov/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/mod.rs:1886:5
     |
1886 |     pub const fn split_at(&self, mid: usize) -> (&[T], &[T]) {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0599]: no method named `split_array_ref` found for reference `&[u8]` in the current scope
    --> src/main.rs:238:14
     |
236  |           *name
     |  __________-
237  | |             .get_unchecked(name.len().wrapping_sub(8)..)
238  | |             .split_array_ref()
     | |_____________-^^^^^^^^^^^^^^^
     |
help: there is a method `split_at` with a similar name, but with different arguments
    --> /Users/almazmurzabekov/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/mod.rs:1886:5
     |
1886 |     pub const fn split_at(&self, mid: usize) -> (&[T], &[T]) {
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0599`.
error: could not compile `one-billion-row-challenge` (bin "one-billion-row-challenge") due to 2 previous errors

Setup:

➜  1brc git:(master) ✗ cargo --version                              
cargo 1.79.0-nightly (48eca1b16 2024-04-12)
➜  1brc git:(master) ✗ rustup --version      
rustup 1.27.0 (bbb9276d2 2024-03-08)
info: This is the version for the rustup toolchain manager, not the rustc compiler.
info: The currently active `rustc` version is `rustc 1.79.0-nightly (1cec373f6 2024-04-16)`
➜  1brc git:(master) ✗ 

attempt to divide by zero - 10k dataset

About the bug that I mentioned on Twitter:

time ./bin/one-billion-row-challenge --print > l.out
thread 'main' panicked at /rustc/62d7ed4a6775c4490e493093ca98ef7c215b835b/library/core/src/num/mod.rs:335:5:
attempt to divide by zero
stack backtrace:
   0: rust_begin_unwind
             at /rustc/62d7ed4a6775c4490e493093ca98ef7c215b835b/library/std/src/panicking.rs:647:5
   1: core::panicking::panic_fmt
             at /rustc/62d7ed4a6775c4490e493093ca98ef7c215b835b/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/62d7ed4a6775c4490e493093ca98ef7c215b835b/library/core/src/panicking.rs:144:5
   3: one_billion_row_challenge::main
   4: core::ops::function::FnOnce::call_once
             at /rustc/62d7ed4a6775c4490e493093ca98ef7c215b835b/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

https://drive.google.com/file/d/1sVA0yUuo5oZoeAQFoPJgaF0cQnq4fPCr/view?usp=sharing

https://drive.google.com/file/d/1hB72HaFC87Pjykv1mu0CIF4kXO5Qjefs/view?usp=sharing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.