Code Monkey home page Code Monkey logo

1brc's Introduction

.NET Solution for One Billion Row Challenge

My blog post: https://hotforknowledge.com/2024/01/13/1brc-in-dotnet-among-fastest-on-linux-my-optimization-journey/

Results

See results in the blog post.

A separate repository for automated benchmarks: https://github.com/buybackoff/1brc-bench.

Build & Run on Linux

To install .NET on Linux, follow official instructions.

To build, run ./build.sh.

To run JIT version: ./jit.sh /path/to/measurements.txt.

To run AOT version: ./aot.sh /path/to/measurements.txt.

1brc's People

Contributors

buybackoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

1brc's Issues

Last benchmark for kpocza's repo

I've made some improvements to my implementation (https://github.com/kpocza/1brc) mostly by decreasing hash collisions, exploiting cache locality.
I expect it to run 10-20% faster (especially for the 10k tests), but far slower than the best competitors.
Please rerun you benchmarks.

New C++ version 3x faster on 10k key dataset

https://github.com/lehuyduc/1brc-simd

Hi, I've updated my code to optimize for the 10k keys dataset. On my PC it's ~3x faster (excluding munmap time) than the commit you tested. Default dataset performance is a bit slower.

Just ./run_cpp.sh to compile and run.

To test the effect of hyper threading, you can do ./run_cpp.sh 12 12 (12 == number of threads total on your CPU). You will see interesting effects on the 10K dataset :D

Thanks! Looking forwards to your updated result.

Add implementation from Cameron Aavik

As per your recommendation on twitter, opening an issue here to see if you can add my solution to your .NET results comparison table: https://github.com/CameronAavik/1brc

I'm using Windows but according to the total process time my solution seems to be ~0.87s faster 2.22s -> 1.35s, and according to Stopwatch inside the program it is ~0.11s faster 1.45s -> 1.34s.

Add noahfalk entry

Hi @buybackoff - thanks for the great article and running this unofficial leaderboard. Would you mind including my entry in your run? https://github.com/noahfalk/1brc/

Its quite speedy from my own benchmarking and I hope the results will be reproducible for you. I put some more info in the README on my repo but if you have any questions about it feel free to reach out. Thanks!

Last refresh

Hello Victor,

I know is very close to the official cut time.
I submitted a PR on 1brc-bench with my repository.

Thanks in advance.

Some ideas of improvements

I was looking at the code and thought there are some things worth trying to further improve the execution time:

  • pass Summary by ref to Merge
  • Summary is 16 bytes and for min/max int is not needed, short should be enough.
  • .Select(x => (Name: x.Key.ToString(), x.Value)) for final result may not be needed, instead Utf8Span could implement IComparable and do comparison on Span property
  • if the previous point is implemented OrderBy can be replaced with something like Span<Utf8Span> spans = stackalloc Utf8Span[result.Count]; spans.Sort();
  • apply [SkipLocalsInit] to Summary since it looks like the Init method is always invoked for a new instance anyway
  • .ToList() may not be needed before calling Aggregate because Aggregate already has a version for ParallelQuery<TSource>
  • Console.Write($"{pair.Name} = {pair.Value}"); allocates an unnecessary string, could instead rewrite it as Console.Write(pair.Name); Console.Write(" = "); Console.Write(pair.Value) however not sure what is the performance of actual Console api, maybe the better strategy would be to just form 1 big resulting string and invoke a single Console.Write?

Sum should be long

@buybackoff max measurement is 99.9 or 100, 1B * 100 > int.MaxValue, so for the general case this would result in overflow. Using long probably won't hurt perf, significantly.

Implement ISpanFormattable on Summary

Hi,

I think you can win something by implementing ISpanFormattable on Summary. This will make the string interpolation in Console.Write($"{pair.Name}={pair.Value}") run faster. You avoid the ToString() method.

You can try this using this code. It was faster on my (slow) computer.

public struct Summary : ISpanFormattable {

    public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) {
        return destination.TryWrite($"{Min / 10.0:N1}/{Average / 10.0:N1}/{Max / 10.0:N1}", out charsWritten);
    }

    public string ToString(string? format, IFormatProvider? formatProvider) {
        return ToString();
    }

If it works you can even gain more performance by reimplementing the TryFormat method without the use of the TryWrite extensionmethod. It quite a lot of work but might work.

Regards,

Fons

Add entry for dzaima

https://github.com/dzaima/1brc - build with make a.out, run with THREADS_1BRC=[desired thread count] ./a.out path/to/measurements.txt.

Language column is mildly non-trivial - there's C, C++, and Singeli involved (Singeli generating gen.c which is committed to the repo to not require Singeli to build).

I expect it to roughly match lehuyduc's entry on both the original & 10K datasets (on my PC my solution is ~10% faster for 10K, and roughly the same for original).

(the repo also has a Java solution, but it's only optimized for the original dataset, and has extremely high variance (>2x; JIT compilation screwing up? haven't bothered looking into it), and with JIT startup time and worse codegen it struggles even at the best of times, so not worth including)

IndexOfNewlineChar optimization?

Not entirely sure I'm following your code (I can't find where the powers of 10 are actually used, for example).

But something struck me about IndexOfNewlineChar: if only \r\n and \n variants are supported (not the \r-only variant that is a relic of old Macs), you can look only for \n, and if found and idx > 0, then go see if it is preceeded by \r and thus needs a 2-character stride. Something like:

internal static int IndexOfNewlineChar(ReadOnlySpan<byte> span, out int stride)
 {
    stride = default;
    int idx = span.IndexOf((byte)'\n');
    if (idx < 0) return;
    int lastIdx = idx - 1;
    if (idx > 0 && span[lastIdx] == '\r') {
       stride = 2;
       return lastIdx;
    }
    stride = 1;
    return idx;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.