Code Monkey home page Code Monkey logo

sse-popcount's Introduction

SIMD popcount

Sample programs for my article http://0x80.pl/articles/sse-popcount.html

https://travis-ci.org/WojciechMula/sse-popcount.svg?branch=master

Paper

Daniel Lemire, Nathan Kurz and I published an article Faster Population Counts using AVX2 Instructions.

Introduction

Subdirectory original contains code from 2008 --- it is 32-bit and GCC-centric. The root directory contains fresh C++11 code, written with intrinsics and tested on 64-bit machines.

There are two programs:

  • verify --- it tests if all non-lookup implementations counts bits properly;
  • speed --- benchmarks different implementations of popcount procedure; please read help to find all options (run the program without arguments).

There are several targets:

  • default --- builtin functions, SSE and popcnt instructions;
  • AVX2 --- all above plus AVX2 implementations;
  • AVX512BW --- all above plus experimental AVX512BW code;
  • AVX512VBMI --- all above plus experimental AVX512VBMI code;
  • AVX512 VPOPCNT --- all above plus experimental AVX512 VPOPCNT code (should be compilable with very recent GCC, software emulator doesn't support this extension yet);
  • arm --- builtin and ARM Neon implementations.

Type make help to find out details. To run the default target benchmark simply type make.

Available implementations

procedure description
lookup-8 lookup in std::uint8_t[256] LUT
lookup-64 lookup in std::uint64_t[256] LUT
bit-parallel naive bit parallel method
bit-parallel-optimized a bit better bit parallel
bit-parallel-optimized2 better utilization of 2- and 4-bit subwords
bit-parallel-mul bit-parallel with fewer instructions
bit-parallel32 naive bit parallel method (32 bit)
bit-parallel-optimized32 a bit better bit parallel (32 bit)
harley-seal Harley-Seal popcount (4th iteration)
sse-bit-parallel SSE implementation of bit-parallel-optimized (unrolled)
sse-bit-parallel-original SSE implementation of bit-parallel-optimized
sse-bit-parallel-better SSE implementation of bit-parallel with fewer instructions
sse-harley-seal SSE implementation of Harley-Seal
sse-lookup SSSE3 variant using pshufb instruction (unrolled)
sse-lookup-original SSSE3 variant using pshufb instruction
avx2-lookup AVX2 variant using pshufb instruction (unrolled)
avx2-lookup-original AVX2 variant using pshufb instruction
avx2-harley-seal AVX2 implementation of Harley-Seal
cpu CPU instruction popcnt (64-bit variant)
sse-cpu load data with SSE, then count bits using popcnt
avx2-cpu load data with AVX2, then count bits using popcnt
avx512-harley-seal AVX512 implementation of Harley-Seal
avx512bw-shuf AVX512BW implementation uses shuffle instruction
avx512vbmi-shuf AVX512VBMI implementation uses shuffle instruction
avx512-vpopcnt AVX512 VPOPCNT
builtin-popcnt builtin for popcnt
builtin-popcnt32 builtin for popcnt (32-bit variant)
builtin-popcnt-unrolled unrolled builtin-popcnt
builtin-popcnt-unrolled32 unrolled builtin-popcnt32
builtin-popcnt-unrolled-errata unrolled builtin-popcnt avoiding false-dependency
builtin-popcnt-unrolled-errata-manual unrolled builtin-popcnt avoiding false-dependency (asembly code)
builtin-popcnt-movdq builtin-popcnt where data is loaded via SSE registers
builtin-popcnt-movdq-unrolled builtin-popcnt-movdq unrolled
builtin-popcnt-movdq-unrolled_manual builtin-popcnt-movdq unrolled (assembly code)
neon-vcnt ARM Neon using VCNT
neon-HS Harley-Seal using Neon VCNT
aarch64-cnt ARMv8 Neon using CNT

Performance results

The subdirectory results contains performance results from various computers. If you can, please contribute.

Acknowledgments

  • Kim Walisch (@kimwalisch) wrote Harley-Seal scalar implementation.
  • Simon Lindholm (@simonlindholm) added unrolled versions of procedures.
  • Dan Luu (@danluu) agreed to include his procedures (builint-*) into this project. More details in Dan's article Hand coded assembly beats intrinsics in speed and simplicity

See also

  • libpopcnt --- library by Kim Walisch utilizing methods from our paper.

sse-popcount's People

Contributors

ashvardanian avatar chponte avatar dryman avatar eppie avatar gsuberland avatar jkflying avatar kimwalisch avatar michaelangel007 avatar seancmonahan avatar simonlindholm avatar wojciechmula avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sse-popcount's Issues

Cannot compile popcnt-avx2-harley-seal.cpp using MSVC 2015

Hi Wojciech,

I use your popcnt-avx2-harley-seal algorithm in my libpopcnt.h. Unfortunately it fails to compile on Windows using a recent MSVC 2015 compiler version:

C:\Users\kim\Desktop\libpopcnt-master>nmake -f Makefile.msvc

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        cl /nologo /W3 /O2 /EHsc /D HAVE_POPCNT /arch:AVX2 /D HAVE_AVX2 test.cpp /Fotest.obj /Fetest.exe
test.cpp
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(356): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(356): error C2660: '_mm256_sub_epi8': function does not take 1 arguments
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2088: '&': illegal for union
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2660: '_mm256_add_epi8': function does not take 1 arguments
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(358): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(369): error C2676: binary '^': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(370): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(370): error C2088: '&': illegal for union
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(371): error C2676: binary '^': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(423): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(424): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(425): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(426): error C3861: '_mm256_extract_epi64': identifier not found
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\cl.EXE"' : return code '0x2'
Stop.

It seems like the MSVC compiler has poor AVX2 support (e.g. the _mm256_extract_epi64 intrinsic seems to be missing)!? Have you ever tried compiling your sse-popcount project using MSVC?

Here is a link to my libpopcnt.h

Thanks,
Kim

AVX2 & popcnt mix

Use AVX2 instructions to load data, then use YMM registers as a source for popcnt instructions. I'm curious how fast it will be.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.