This is rather important. Many platforms don't have SIMD at all. Original pre-Intel Hy

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reenable C backend for non-SIMD platforms about vectorscan HOT 35 CLOSED

markos commented on August 12, 2024 2

Reenable C backend for non-SIMD platforms

from vectorscan.

Comments (35)

markos commented on August 12, 2024 2

Ok, it's slower of course, but out of 20k unit tests, only 4 failures, not bad at all! And easy to fix from what it seems. Thanks for the suggestions.

from vectorscan.

markos commented on August 12, 2024 1

This is not decided yet. But it's a suggestion worth investigating. I will be looking at the whole SIMD approach soon, so SIMDe is an obvious choice. The biggest problem I've had with similar approaches is that they are too x86-centric and that is bad for every other platform (eg emulating movemasks on non-x86).

from vectorscan.

markos commented on August 12, 2024 1

Furthermore, I should clarify that the C backend will be 2-fold. First a C backend for SIMD, but also reinstating pure scalar algorithms. Both have to be done.

from vectorscan.

mr-c commented on August 12, 2024 1

The easy way enable portability with SIMDe is to replace include <x86intrin.h> with

#define SIMDE_ENABLE_NATIVE_ALIASES=1
#include <simde/x86/avx512.h>

and add SIMDe to your include path.

To force the usage of the non-optimized implementations, you can define SIMDE_NO_NATIVE prior to the import (or on the compiler command line)

from vectorscan.

markos commented on August 12, 2024 1

@mr-c so, this proved to be easier than I expected, I now have a working SIMDe-based backend that I'm testing on a platform without supported SIMD (Loongson64). Actually running the unit tests now, if everything works, I'll test the SIMDe backend on other arches as well and compare performance.

from vectorscan.

markos commented on August 12, 2024 1

@mr-c Benchmarks will follow soon in the wiki, but I noticed something very interesting, enabling the SIMDe SSE4.2 native backend for Power was consistently ~20% faster than my native VSX port :)
It was the other way around for Neon though :)

In any case, the best thing is that it allows vectorscan to run on SIMD-less architectures, thanks for a great library!

from vectorscan.

victorjulien commented on August 12, 2024 1

Does this mean that vectorscan should work on essentially all architectures too? E.g. something like Risc V or Mips? Trying to see if in a project like Suricata we can go "all in" on the vectorscan API w/o the need for fallback code for platforms/architectures where vectorscan may not be available.

@victorjulien It means exactly that, I was able to run Vectorscan on a Loongson system -which does have a SIMD unit but is not yet supported in vectorscan, there is a PR pending. In fact this was exactly where the port was developed on, to make sure it will not accidentally execute any native SIMD instructions. Of course it will be slower but it means you can have a consistent API. And when native support is added in SIMDe, we can enable it with a single compile flag. I would still go for the native ports eventually but having SIMDe means it will work out of the box initially.

Amazing work, thanks!

from vectorscan.

mr-c commented on August 12, 2024

@markos Will you be using SIMDe for the C backend? Please let us know if we are missing any needed intrinsics and I'll try to fast track them.

from vectorscan.

markos commented on August 12, 2024

@mr-c I'd like to evaluate SIMDe as it's next on my to-do list. You mentioned it also has a C backend. Could you please point me to how to enable it? Also, one aspect that we would like to have is emulation of wider vectors, eg. emulate 256-bit/512-bit SIMD on NEON or VSX. Is that available?

from vectorscan.

mr-c commented on August 12, 2024

@markos Yes, SIMDe works with C/C++ codebases. See https://github.com/simd-everywhere/simde#usage
When I'm adapting a typical x86-64 SIMD using codebase for Debian, I use these notes https://wiki.debian.org/SIMDEverywhere#Approach

SIMDe implementation of 256 & 512 bit AVX/AVX2/AVX512 x86-64 intrinsics often (but not always) have optimized NEON and VSX versions that are selected automatically when compiling on those platforms.

from vectorscan.

markos commented on August 12, 2024

I wasn't clear enough, I'm looking for a C backend, in order to run SIMDe code on a non-SIMD platform -or one that does not currently have SIMD support enabled. Essentially a way to emulate SIMD using plain C. This would enable vectorscan to run on platforms without current SIMD support.

Reason: to enable running on very new architectures without SIMD support.

from vectorscan.

mr-c commented on August 12, 2024

Maybe a definition will help:

"SIMDe" == the SIMD Everywhere drop in header-only library that implements SIMD intrinsics (SSE*, AVX*, NEON, etc..) on various architectures.

SIMD Everywhere is the "way to emulate SIMD using plain C" that you are asking for :-)

from vectorscan.

markos commented on August 12, 2024

that is great to hear! So how do I enable just the C backend? Assuming I have a system with non-supported SIMD, or one with supported SIMD but I disable that support. If I integrate vectorscan with simde, is there a define that forces the C backend? Essentially that's what I'm asking.

from vectorscan.

markos commented on August 12, 2024

Initial implementation added: https://github.com/VectorCamp/vectorscan/tree/feature/enable-simde-backend
Need to test if it works on x86/arm/ppc64le architectures and also add an extra flag (SIMDE_NATIVE?) to enable alternative code for native paths for those architectures and compare performance between SIMDe and vectorscan's native implementation.

All tests pass.

from vectorscan.

mr-c commented on August 12, 2024

@markos Super cool!

As for flags, if you set the architecturally appropriate equivalent of -march=native (or -mcpu={what CPU you actually have} then SIMDE should pick up on the features available.

On GCC/clang, please add -fopenmp-simd -DSIMDE_ENABLE_OPENMP to your CFLAGS/CXXFLAGS.

And we recommend -O3 as well, but I think you have that already.

from vectorscan.

markos commented on August 12, 2024

arch detection is done separately, I am testing it now on an Arm64 system and no surprise there are a lot of build failures as it competes with existing definitions, but should get these fixed quickly. Similarly I will do the tests for the other architectures. As for OpenMP, I will leave this out, at least for now, threading is not supposed to be done internally within Vectoscan.

from vectorscan.

mr-c commented on August 12, 2024

-fopenmp-simd doesn't bring in the OpenMP runtime nor threading; it helps the compiler make use of the OpenMP loop vectorization hints we have in the SIMDe codebase (in case we didn't come up with an optimized implementation for a particular intrinsic for the given architecture)

See https://github.com/simd-everywhere/simde#openmp-4-simd for a fuller explanation
https://www.openmp.org/spec-html/5.0/openmpsu42.html
https://github.com/simd-everywhere/simde/blob/471a34285aa6909d5b9b9ff3dcebfa6acf3bce47/simde/simde-common.h#L355-L371

https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html

The -fopenmp-simd flag can be used to enable a subset of OpenMP directives that do not require the linking of either the OpenMP runtime library or the POSIX threads library.

https://clang.llvm.org/docs/UsersManual.html#openmp-features

Use -fopenmp-simd to enable OpenMP simd features only, without linking the runtime library; for combined constructs (e.g. #pragma omp parallel for simd) the non-simd directives and clauses will be ignored.

from vectorscan.

markos commented on August 12, 2024

I see, I will check this out then, thanks for the clarification!

from vectorscan.

markos commented on August 12, 2024

ok, compilation is fixed but I'm getting many failing tests on Arm/SIMDe, I will need to investigate these, it's probably something simple.

from vectorscan.

markos commented on August 12, 2024

Fixed in #203

from vectorscan.

victorjulien commented on August 12, 2024

Does this mean that vectorscan should work on essentially all architectures too? E.g. something like Risc V or Mips? Trying to see if in a project like Suricata we can go "all in" on the vectorscan API w/o the need for fallback code for platforms/architectures where vectorscan may not be available.

from vectorscan.

markos commented on August 12, 2024

Does this mean that vectorscan should work on essentially all architectures too? E.g. something like Risc V or Mips? Trying to see if in a project like Suricata we can go "all in" on the vectorscan API w/o the need for fallback code for platforms/architectures where vectorscan may not be available.

@victorjulien It means exactly that, I was able to run Vectorscan on a Loongson system -which does have a SIMD unit but is not yet supported in vectorscan, there is a PR pending. In fact this was exactly where the port was developed on, to make sure it will not accidentally execute any native SIMD instructions. Of course it will be slower but it means you can have a consistent API. And when native support is added in SIMDe, we can enable it with a single compile flag. I would still go for the native ports eventually but having SIMDe means it will work out of the box initially.

from vectorscan.

markos commented on August 12, 2024

Only thing to implement to ensure that it can run on all platforms is adding BE support, this is also being considered but not decided yet.

from vectorscan.

Jc2k commented on August 12, 2024

Does this work with fat runtime?

from vectorscan.

markos commented on August 12, 2024

Does this work with fat runtime?

It could but there is little point in enabling fat runtime for it for the current architectures, the only cases I could think of platforms where a SIMD unit is optional, like eg Arm 32-bit where Neon is optional, or PowerPC 32-bit where again Altivec is optional. But then again, these 32-bit architectures are not supported anyway -and we are not sure we will continue supporting 32-bit in general. Unless there is a valid use case to support it. Possibly with RISC-V as well, but I don't have actual RISC-V hardware with RVV to test here anyway. Is there a particular use case you have in mind?

from vectorscan.

Jc2k commented on August 12, 2024

I think there are amd64 chips that are supported by distros (e.g. RH technically compiles for the very first amd64 chip) that have SSE2 and not SSE3, and iirc hyperscan targets the "core2" baseline (SSE3) as a minimum?

Unless vectorscan has an SSE2 backend (sorry could have easily missed it!) then I guess enabling this in fat runtime would technically mean that amd64 vectorscan would work on the same CPU's that distros like RH work on, even if most people do have SSE3 as a minimum? Certainly not groundbreaking, but potentially valuable to packagers.

Not a use case for me personally, the oldest thing I have is SSE3.

from vectorscan.

mr-c commented on August 12, 2024

Yes, for Debian amd64 we have to support SSE2 only system (runtime detection, CPU dispatch, etc.. of higher levels is okay, of course)

https://wiki.debian.org/ArchitectureSpecificsMemo#Architecture_baselines
https://wiki.debian.org/InstructionSelection

from vectorscan.

markos commented on August 12, 2024

@Jc2k I see your point, yes, if 32-bit i386 is to be supported, then indeed we could drop the baseline to SSE2 so as to support the older chips. We will consider 32-bit in general and it should be decided for next release.

from vectorscan.

Jc2k commented on August 12, 2024

So if you decide to not support 32-bit i386, then you will also at the same time decide not to support 64-bit x86_64 chips that don't have SSE3?

from vectorscan.

markos commented on August 12, 2024

it's not as simple, there are more things than SIMD that involve special casing. But in short, given that AVX2 is more than 15 years old, the thought of increasing the base line dependency has crossed my mind yes. :)

from vectorscan.

Jc2k commented on August 12, 2024

Fedora discussed exactly that this last cycle. In short most of their developers didn't meet that baseline. SSE3 was considered more reasonable, but for now they are sticking with SSE2 for 64-bit.

Fedora is upstream of RH, so it's going to be a looong time for RH people to even benefit from SSE3 in distro provides packages.

😅

from vectorscan.

markos commented on August 12, 2024

But vectorscan is not a distro nor do we have the resources to support all the possible configurations, even the current supported list in our CI is more than many other projects currently do: https://buildbot-ci.vectorcamp.gr/#/grid
We just added SIMDe + SIMDe native configurations for every architecture in that list, and we expect to have Loongson in there soon, plus others in the near future: RISC-V, MIPS. However the situation with Intel is already too complicated, for this reason we are considering limiting the options for x86 to just AVX2/AVX512, and leave SSE2-SSE4.2 for only 32-bit CPUs -IF 32-bit support stays.

from vectorscan.

markos commented on August 12, 2024

in any case, these are just thoughts at the moment, and this ticket is not really the best place for this discussion :)

from vectorscan.

Jc2k commented on August 12, 2024

Thank you. That last paragraph was the bit I was missing. I couldn't understand what 32-bit had to do with my question.

So if I understand your plans correctly you may drop support for SSE3 entirely and you do not plan to add SIMDe to fat runtime on x86_64. That's very useful to know.

Thanks for answering.

from vectorscan.

markos commented on August 12, 2024

"may" is the keyword here. But regarding SIMDe on x86_64 fat runtime, I don't know if there is a reason for that, I am not aware of any widely available 64-bit CPU that lacks SSE4.2 at the moment. Note the "widely available", we are talking about almost 20 years old tech here. AVX2 is here since 2008. But please use another ticket for this if you think it should be supported/discussed.

from vectorscan.

Reenable C backend for non-SIMD platforms about vectorscan HOT 35 CLOSED

Comments (35)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent