Hi, I tested your code with an original officially generated test case (which still follows all your extra assumptions), but it gives a lot of wrong average value (off-by-one error) and maybe some others. The input file and the result_ref.txt
can be downloaded here: https://github.com/lehuyduc/1brc-simd
I tried 3 differents commit: pdep parsing
, latest cleanup
, and fix simd imports for latest nightly
, but they all give wrong results. I also benchmark 2 of them.
Could you check what's wrong? Thanks!
Also, if you upload your measurements.txt
file, I can test it on my PC for better comparison with your results.
Example of wrong values:
In pdep parsing
and cleanup
commit
Veracruz: 37.3/37.3/37.3
Abha: -37.5/17.9/69.9
Abidjan: -30.0/25.9/78.1
Abéché: -23.6/29.4/81.0
Accra: -23.1/26.3/75.5
...
Veracruz: -22.7/25.4/77.6 <--- this appear 2 times with 2 different results!!! Hash map error?
Reference
{Abha=-37.5/18.0/69.9, Abidjan=-30.0/26.0/78.1, Abéché=-23.6/29.4/81.0, Accra=-23.1/26.4/75.5, ... Veracruz=-22.7/25.4/77.6,
Run command I use is below.
cargo run --quiet --release -- measurements.txt --print > log.txt 2>&1
time ./target/release/one-billion-row-challenge measurements.txt --print
I set number of threads manually (then compile again each time):
let N_THREADS: usize = 6;
let records = run_parallel(
data,
&phf,
num_slots,
N_THREADS
);
Benchmark on 2950X, 2133MHz quad channel RAM, 3.65 GHz (32 thread) to 4.3 GHz (1 thread)
Commit pdep parsing
021bed3
32
total: 1.81s
real 0m1.965s
user 0m54.607s
sys 0m0.556s
16
total: 2.59s
real 0m2.745s
user 0m39.422s
sys 0m0.431s
12
total: 3.25s
real 0m3.412s
user 0m38.312s
sys 0m0.528s
8
total: 4.84s
real 0m4.994s
user 0m38.113s
sys 0m0.488s
6
total: 6.38s
real 0m6.537s
user 0m37.801s
sys 0m0.396s
4
total: 9.52s
real 0m9.678s
user 0m37.648s
sys 0m0.439s
1
total: 37.05s
real 0m37.211s
user 0m36.893s
sys 0m0.332s
Commit cleanup
01e4bc3
32
total: 589.26ms
real 0m0.744s
user 0m16.016s
sys 0m0.689s
16
total: 857.19ms
real 0m1.016s
user 0m12.893s
sys 0m0.512s
12
total: 1.11s
real 0m1.274s
user 0m12.684s
sys 0m0.480s
6
total: 2.12s
real 0m2.277s
user 0m12.399s
sys 0m0.417s
1
total: 12.19s
real 0m12.345s
user 0m12.004s
sys 0m0.365s