Code Monkey home page Code Monkey logo

Comments (6)

RagnarGrootKoerkamp avatar RagnarGrootKoerkamp commented on June 9, 2024 1
  • right I think I do rounding to 0 not actual rounding. Not a big issue.
  • cities appearing twice is definitely weird though. Will check.
  • from these timings I'd infer that your CPU doesn't support PDEP maybe? Does it have the BMI2 feature flag? Maybe it's emulated by the compiler now which could be slow. I'll try to update the code to print a warning if it's not supported.
  • you can just run with -j1 to run only 1 thread.
  • I'll download your input and use that from now.
  • the remaining performance difference is probably just differences in CPU. It does seem that yours has more L1 cache per thread (48kB Vs my 32kB), but that doesn't explain the slower speed.

from 1brc.

RagnarGrootKoerkamp avatar RagnarGrootKoerkamp commented on June 9, 2024 1

Oh right this:
https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#%3A%7E%3Atext%3Dwithout_affecting_flags-%2CParallel_bit_deposit_and_extract%2Cto_be_packed_or_unpacked.?wprov=sfla1

AMD processors before Zen 3 that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles rather than (Zen 3) 3 cycles. As a result it is often faster to use other instructions on these processors.

from 1brc.

RagnarGrootKoerkamp avatar RagnarGrootKoerkamp commented on June 9, 2024 1

fixed the wrong rounding and the initial bad city was an off-by-one in the initialization.

I don't think I will make non-PDEP code, since it uses a different data storage format.

I'll make a new benchmark at some later point today.

from 1brc.

lehuyduc avatar lehuyduc commented on June 9, 2024 1

Thanks! Now the performance numbers make more sense (single thread HT vs no HT not much difference).

It seems Intel's HT is just worse than AMD's.

from 1brc.

lehuyduc avatar lehuyduc commented on June 9, 2024

Great finding! Also, can you run 6 thread vs 12 thread both with HT on? Your benchmark currently only has 6t HT off vs 12t HT on.

The difference is ~60% for both of ours code on AMD 2950X. I'm curious what's the number on an Intel CPU

from 1brc.

RagnarGrootKoerkamp avatar RagnarGrootKoerkamp commented on June 9, 2024

Added latest results here:
https://curiouscoding.nl/posts/1brc/

from 1brc.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.