Code Monkey home page Code Monkey logo

Comments (17)

mdb256 avatar mdb256 commented on May 26, 2024

Yes, the difference is that the two machines support different instruction set extensions.

The machine you used to compile the HS library uses a 4th Gen i7 (which goes by the codename Haswell). This machine supports AVX2 and BMI1/2 instructions, and when we compile with -march=native GCC will emit AVX2/BMI2 instructions. These instructions will not work on the Xeon you have (codenamed Ivy Bridge), as it supports AVX, but not AVX2 or BMI2. This is the cause of the SIGILL

To fix this, you'll either need to compile Hyperscan on the Xeon, or configure the build on the first machine by passing in the correct -march=<xx> flags for your compiler. Different versions of GCC support differing arguments to this flag.

from hyperscan.

crazy-william avatar crazy-william commented on May 26, 2024

hi, mdb256, I'm one guy with TidyHuang.

I have a question, Where do you get the gen and supports of CPU type? When we use march flags, we will use the CPU type of running machine, is it right? Have we one method to let program run all the Intel X86_64 CPU? I don't get one through gcc manpage.
Thank you very much!

Our gcc version is:

gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Through "man gcc", I got this:

           i686
               When used with -march, the Pentium Pro instruction set is used, so the code runs on all i686 family chips.  When used with -mtune, it has the same meaning as
               generic.

           pentium2
               Intel Pentium II CPU, based on Pentium Pro core with MMX instruction set support.

           pentium3
           pentium3m
               Intel Pentium III CPU, based on Pentium Pro core with MMX and SSE instruction set support.

           pentium-m
               Intel Pentium M; low-power version of Intel Pentium III CPU with MMX, SSE and SSE2 instruction set support.  Used by Centrino notebooks.

           pentium4
           pentium4m
               Intel Pentium 4 CPU with MMX, SSE and SSE2 instruction set support.

           prescott
               Improved version of Intel Pentium 4 CPU with MMX, SSE, SSE2 and SSE3 instruction set support.

           nocona
               Improved version of Intel Pentium 4 CPU with 64-bit extensions, MMX, SSE, SSE2 and SSE3 instruction set support.

           core2
               Intel Core 2 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.

           corei7
               Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 instruction set support.

           corei7-avx
               Intel Core i7 CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES and PCLMUL instruction set support.

           core-avx-i
               Intel Core CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AES, PCLMUL, FSGSBASE, RDRND and F16C instruction set support.

           core-avx2
               Intel Core CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2 and F16C
               instruction set support.

           atom
               Intel Atom CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3 and SSSE3 instruction set support.

Thanks a lot.

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

Yes, you should choose the architecture of the machine that you plan to run Hyperscan on.

The list from GCC 4.8 is somewhat confusing - and I should note it has changed in newer versions of GCC - but the minimum feature set required for Hyperscan is core2. Your machines are very likely to support more than the minimum, most likely to also support SSE4.1/4.2 which would be the confusing name of corei7 from this list.

The Xeon v2 that was mentioned earlier is covered by core-avx-i - the extra features that this includes do allow some performance improvements in Hyperscan over the baseline of core2.

The Haswell that you first built Hyperscan on would be using the feature sets from core-avx2. Again, there are performance improvements from using more recent features, but if they aren't available on the machines you will be using, then you cannot build the library with these instructions.

from hyperscan.

crazy-william avatar crazy-william commented on May 26, 2024

OK, Thanks for soon reply again! It's so clear now.

from hyperscan.

TidyHuang avatar TidyHuang commented on May 26, 2024

Hi Matt,

Thanks for your kind and detail answer. 
Based on your suggestion, I've successfully fixed my crash issue partially.   And then I've done several times of experiment. 
My project has one executable program which depends on several dynamical libs: eg: lib1.so, lib2.so, lib3.so ... libn.so and libhs.so, these libraries are independent with each other, and lib1.so and lib2.so are preinstalled libraries (without extra compiling flags) at running VM.
1). The first time I build my project with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS using -march=core-avx-i -march=generic for executable program and dependent libraries: lib3.so... libn.so and libhs.so at a 4th Gen i7 (codename Haswell), then the program can run well on the xeon. (codenamed Ivy Bridge),)
2). And then I tried to build my project with CMAKE_C_FLAGS and CMAKE_CXX_FLAGS using -march=core-avx-i -march=generic, either some libraries or the program uses the compiling flag, however, the program will be crashed as previous, which is very wired to me.

My questions:
a) why the first time can run well without crash since all the experiments the project are dependent on the preinstalled dynamical libraries: lib1.so, lib2.so.
b) What's the scope of the march=core-avx-i -march=generic will affect, that's to say, the program with hyperscan and its dependent libraries all should be compile with the " march=core-avx-i -march=generic " flags.

-Tidy

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

I'm not quite sure I understand. Firstly I suspect you mean "-march=core-avx-i -mtune=generic". Specifying -march twice will usually mean the second one overrides the first. Also generic is not a valid argument for -march=.

If you compile Hyperscan with -march=core-avx-i it should not affect any other library. Is it possible there is still a version of the Hyperscan library built on the Haswell with -march=native in the dynamic library path?

from hyperscan.

TidyHuang avatar TidyHuang commented on May 26, 2024

Thanks Matt, there is a typo for mtune=generic. In theory�$B!$�(B there should be no existing library with march=native. two if us have done such testing. I'll use a clean VM to test and sperate depend lib one by one.

from hyperscan.

starius avatar starius commented on May 26, 2024

Can hyperscan detect CPU features in runtime?

When I compile hyperscan with -mavx2 and then disable AVX2 in hs_platform_info passed to hs_compile, I still get SIGILL in function getMask on machine without AVX2 support. Is it expected behaviour? If so, what is the purpose of hs_platform_info structure?

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

The hs_platform_info structure is for the Hyperscan compiler, and allows the HS compiler to determine which engines should be chosen while it builds the pattern database. Modifying hs_platform_info is the equivalent of cross-compiling for using hs_compile

Compiling the Hyperscan lib with -mavx2 means that the C/C++ compiler is free to generate AVX/2 instructions, VEX encoded SSE instructions, and use ymm registers - and these can occur at any time during execution, and not in any way that Hyperscan could detect or avoid on non-AVX2 platforms.

from hyperscan.

starius avatar starius commented on May 26, 2024

@mdb256, is it possible to enable -mavx2 only on files using AVX2? I see files *_avx2.c in hyperscan source tree. If only *_avx2.c are compiled with -mavx2, then it will be possible to build universal libhs.so file and delegate the decision of using AVX2 to hs_platform_info, not to cmake options. In this case one can put this universal libhs.so on 2 types of machines (with AVX2 and without AVX2), compile regular expressions to 2 bytecodes (AVX2-enabled bytecode and AVX2-disabled bytecode) using hs_platform_info and select in runtime what bytecode to use using runtime information of whether AVX2 is available for that machine.

Note: there are actually more than 2 instruction sets, so replace 2 with actual number.

Note 2: it would be even better if hyperscan had cross-platform bytecode.

from hyperscan.

starius avatar starius commented on May 26, 2024

Something like this:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6710979..2a398e4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -184,10 +184,10 @@ else()

     if (NOT CMAKE_C_FLAGS MATCHES .*march.*)
         message(STATUS "Building for current host CPU")
-        set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -march=native -mtune=native")
+        set(EXTRA_C_FLAGS "${EXTRA_C_FLAGS} -mssse3")
     endif()
     if (NOT CMAKE_CXX_FLAGS MATCHES .*march.*)
-        set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -march=native -mtune=native")
+        set(EXTRA_CXX_FLAGS "${EXTRA_CXX_FLAGS} -mssse3")
     endif()

     if(CMAKE_COMPILER_IS_GNUCC)
@@ -540,13 +540,16 @@ set (hs_exec_SRCS
     src/database.h
 )

-if (HAVE_AVX2)
     set (hs_exec_SRCS
         ${hs_exec_SRCS}
         src/fdr/teddy_avx2.c
         src/util/masked_move.c
         )
-endif ()
+    set_source_files_properties(
+        src/fdr/teddy_avx2.c
+        src/util/masked_move.c
+        PROPERTIES COMPILE_FLAGS -mavx2
+        )


 SET (hs_SRCS

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

Unfortunately it isn't as simple as just building some of the files with avx2 - those two files are only required for avx2 builds, but there are many more places we use avx2 instructions where they are available.

Similarly in the Hyperscan lib we use a mix of other microarch additions where we can, like sse4.2 (crc32), popcnt, bmi2 (pext, pdep), and more.

We have looked at building a "fat binary", or as you say a universal lib that supports as many different microarchitectures as required - but it is going to take some time, and has portability problems. Plus we need to be careful about mixing SSE and AVX instructions, as switching between them can incur expensive performance penalties.

from hyperscan.

sadegh01 avatar sadegh01 commented on May 26, 2024

which flag make it executable for all range of hardware's ?

from hyperscan.

starius avatar starius commented on May 26, 2024

@sadegh01, -DCMAKE_C_FLAGS="-march=core2" -DCMAKE_CXX_FLAGS="-march=core2" works for me.

from hyperscan.

StefanBruens avatar StefanBruens commented on May 26, 2024

As the hs core is written in C++ (as far as I can see), wouldn't use of function multiversioning https://gcc.gnu.org/wiki/FunctionMultiVersioning be applicable here?

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

FMV seems to be a popular topic lately. I spent a while trying to make it work for Hyperscan, but it wasn't the right fit.

We have a working version of the fat runtime working that I mentioned above - it is still a bit experimental, but I'll be pushing the commits soon. It works by building n-copies of the runtime code (the C, not the C++) and uses the indirect function attribute to dispatch the right API function based on what the host platform supports.

from hyperscan.

mdb256 avatar mdb256 commented on May 26, 2024

Hyperscan v4.4 includes the fat runtime work for Linux, and this issue is becoming a collection of somewhat related items.

I'm going to close this issue, but if please open a new issue or contact us directly if there are any problems.

from hyperscan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.