Code Monkey home page Code Monkey logo

despacer's Introduction

despacer's People

Contributors

aminya avatar aqrit avatar kloetzl avatar lemire avatar stolendata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

despacer's Issues

Failed to make on Debian but succeeded on Mac OSX

suhubdyd@eos14:~/dev/despacer$ make
cc -fPIC -std=c99 -O3  -march=native -Wall -Wextra -Wshadow -o despacebenchmark ./benchmarks/despacebenchmark.c -Iinclude
In file included from ./benchmarks/despacebenchmark.c:6:0:
include/despacer.h: In function ‘cleanm256’:
include/despacer.h:377:5: warning: implicit declaration of function ‘_mm256_loadu2_m128i’ [-Wimplicit-function-declaration]
     __m256i mask = _mm256_loadu2_m128i((const __m128i *)despace_mask16 + maskhigh, (const __m128i *)despace_mask16 + masklow);
     ^
include/despacer.h:377:20: error: incompatible types when initializing type ‘__m256i’ using type ‘int’
     __m256i mask = _mm256_loadu2_m128i((const __m128i *)despace_mask16 + maskhigh, (const __m128i *)despace_mask16 + masklow);
                    ^
include/despacer.h: In function ‘avx2_despace_branchless’:
include/despacer.h:400:5: warning: implicit declaration of function ‘_mm256_storeu2_m128i’ [-Wimplicit-function-declaration]
     _mm256_storeu2_m128i((__m128i *)(bytes + pos + offset1), (__m128i *)(bytes + pos ),x);
     ^
Makefile:14: recipe for target 'despacebenchmark' failed
make: *** [despacebenchmark] Error 1
grok-machine:despacer dendisuhubdy$ make
cc -fPIC -std=c99 -O3  -march=native -Wall -Wextra -Wshadow -o despacebenchmark ./benchmarks/despacebenchmark.c -Iinclude
grok-machine:despacer dendisuhubdy$ ./despacebenchmark 
pointer alignment = 4096 bytes 
memcpy(tmpbuffer,buffer,N):  0.082031 cycles / ops
countspaces(buffer, N):  2.191406 cycles / ops
countspaces32(buffer, N):  0.730469 cycles / ops
despace(buffer, N):  1.578125 cycles / ops
despace32(buffer, N):  1.074219 cycles / ops
faster_despace(buffer, N):  1.318359 cycles / ops
faster_despace32(buffer, N):  1.609375 cycles / ops
despace64(buffer, N):  1.353516 cycles / ops
despace_to(buffer, N, tmpbuffer):  1.441406 cycles / ops
avx2_countspaces(buffer, N):  0.078125 cycles / ops
avx2_despace(buffer, N):  1.398438 cycles / ops
avx2_despace_branchless(buffer, N):  0.218750 cycles / ops
avx2_despace_branchless_u2(buffer, N):  0.205078 cycles / ops
sse4_despace(buffer, N):  0.486328 cycles / ops
sse4_despace_branchless(buffer, N):  0.320312 cycles / ops
sse4_despace_branchless32(buffer, N):  0.320312 cycles / ops
sse4_despace_branchless_u2(buffer, N):  0.203125 cycles / ops
sse4_despace_branchless_u4(buffer, N):  0.210938 cycles / ops
sse4_despace_branchless_mask8(buffer, N):  0.431641 cycles / ops
sse4_despace_trail(buffer, N):  1.126953 cycles / ops
sse42_despace_branchless(buffer, N):  0.496094 cycles / ops
sse42_despace_branchless_lookup(buffer, N):  0.501953 cycles / ops
sse42_despace_to(buffer, N,tmpbuffer):  1.050781 cycles / ops

has_space function

It would be nice to add has_space function. Currently, there is a avx2_countspaces, which counts all the number of spaces. I think it is possible to add another function to return true as soon as it finds space.

Use `-march=haswell` or similar flags instead of `-march=native`

The CMake file uses -march=native, which generates binaries that are not usable on other machines that might lack the instruction set.
I think the better approach would be to use -march=haswell, which is a good default these days.

For avx512, a CMake option could be exposed.

despacer/CMakeLists.txt

Lines 17 to 19 in 579530b

target_compile_options(despacer PRIVATE /arch:native)
else()
target_compile_options(despacer PRIVATE -march=native)

-march=haswell includes AVX2 and older SSE instruction sets :
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

Mistakes in sse4_despace_branchless_mask8

I found a few mistakes in sse4_despace_branchless_mask8(.), they probably don't influence benchamerk results but prevent it from generating correct string:

  1. _mm_or_si128(m1,m2) should be replaced with _mm_and_si128(m1,m2), because 0xFF & x == x, 0xFF | x = 0xFF .

  2. Tables are probably wrong too (on Intel CPUs). First line of despace_mask8_1 should look:

    0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x8,0x9,0xA,0xB,0xC,0xD,0xE,0xF,

instead of:

0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7,0x6,0x5,0x4,0x3,0x2,0x1,0x0,

First line of despace_mask8_2 should look:

0x0,0x1,0x2,0x3,0x4,0x5,0x6,0x7,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,

instead of:

0xf,0xe,0xd,0xc,0xb,0xa,0x9,0x8,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,

Find indices of white spaces within string

Amazing, amazing work!

Just thought of mentioning here that I would love to see this tweaked a bit such that, with the help of your simdprune library, one could find the indices of the white spaces chars within a given char string - instead of removing them. That would be of tremendous help in numerous string cleaning tasks.

faster despace64

parts of HASZERO/HASVALUE can be eliminated if not matching signed bytes
other parts can be deferred & combined when checking multiple values

https://gist.github.com/aqrit/6e73ca6ff52f72a2b121d584745f89f3

other things of interest in that gist (speeds unknown)

  • swar method for removing bytes from a word
  • sse2 using prefix sums and lots of shifting
  • ssse3 using 1024 byte lut

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.