Code Monkey home page Code Monkey logo

bolt's People

Contributors

adcastel avatar alexey-bataev avatar andreychurbanov avatar atoker avatar chandlerc avatar devnexen avatar dimitryandric avatar doru1004 avatar grokos avatar guansong avatar hahnjo avatar hansangbae avatar jcownie-intel avatar jdenny-ornl avatar jdoerfert avatar jhuber6 avatar jonchesterfield avatar jpeyton52 avatar jprotze avatar jsonn avatar kkwli avatar krytarowski avatar mgorny avatar nawrinsu avatar pawosm-arm avatar shiltian avatar shintaro-iwasaki avatar sylvestre avatar ye-luo avatar zmodem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bolt's Issues

inefficient "ordered" scheduling

An ordered loop is not optimized well, which causes sometimes a timeout in test/worksharing/for/omp_for_collapse.c if execution streams are oversubscribed.

  #pragma parallel omp for ordered
  for (i = 1; i < 10000; i++) {
    #pragma omp ordered
    comp(i);
  }

Although order is not widely used, it should be optimized by using synchronization primitives.

Segfault in ABT_mutex_lock

Environment:

  • LLVM 3.9.1
  • BOLT 1.0a1
  • Argobots 1.0a1

Reproducible code:
bolt_abt_mtx_segf.txt

Error message:

[mutex.c:172] ABT_mutex_lock: 20
Segmentation fault (core dumped)

GDB output:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f64ca2426c2 in ABT_mutex_create ()
   from /usr/local/llvm-bolt-argobots/lib/../lib/libabt.so.0
 (gdb) where
 #0  0x00007f64ca2426c2 in ABT_mutex_create ()
   from /usr/local/llvm-bolt-argobots/lib/../lib/libabt.so.0
#1  0x00007f64caa75300 in __kmp_do_serial_initialize ()
    at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:139
#2  0x00007f64caa75a5a in __kmp_do_middle_initialize ()
    at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:363
#3  0x00007f64caa75ddb in __kmp_middle_initialize ()
    at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:470
#4  0x00007f64caa99362 in __kmp_api_omp_get_num_procs ()
    at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_ftn_entry.h:444
#5  0x0000000000400b09 in main ()```

Are task dependencies (depend clauses) in bolt working?

This could be my user error. I have a simple code where I am trying to get T2 to execute after T1 because T1 writes x which is input to T2. The code is not setting x to 2 before T2 runs. Here is the code. This fails the same way in both libgomp and libbolt. It fails in clang and gcc 7.5.

#include <stdio.h>
#include <omp.h>
int foo() {
   int x = 1;
   int x_is_not_equal_two=0;
   #pragma omp task depend(in:x) shared(x_is_not_equal_two,x)
   {
      if (x != 2) {
         x_is_not_equal_two = 1;
         printf(" T2: INPUT dependend clause x should now be 2  x:%d\n",x);
      }
   }
   #pragma omp task depend(out:x) shared(x)
   {
      printf(" T1: OUTPUT setting x to 2\n");
      x=2;
   }
   printf("before taskwait x:%d\n",x);
   #pragma omp taskwait
   printf("after  taskwait x:%d  x  was not equal 2 in T2:%d \n", 
      x, x_is_not_equal_two);
   return x_is_not_equal_two;
}

int main() {
   int rc=0;
   omp_set_num_threads(2);
   #pragma omp parallel
   #pragma omp single nowait
   rc = foo();
   printf("rc:%d\n",rc);
   return rc;
}

BOLT seems does not work well with the Dense Linear Lib PLASMA

Hi, I would like to take advantage of lightweight thread libraries, e.g., Argobots, and runtime systems over them, e.g., BOLT. I have successfully configured and built my lib. But I need clarification on the output libs, e.g., libgomp.so, libiomp5.so, libomp.so. Please tell me which one I should use. Do the three libs mean GCC OMP runtime, Intel OMP runtime, and BOLT, respectively?

  • Configuration Comand
cmake .. -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/home/fx/lib/bolt -DCMAKE_C_COMPILER=/home/fx/.local/gcc-11.2.0/bin/gcc -DCMAKE_CXX_COMPILER=/home/fx/.local/gcc-11.2.0/bin/g++ -DOPENMP_TEST_C_COMPILER=/home/fx/.local/gcc-11.2.0/bin/gcc -DOPENMP_TEST_CXX_COMPILER=/home/fx/.local/gcc-11.2.0/bin/g++ -DCMAKE_BUILD_TYPE=Release -DLIBOMP_USE_ARGOBOTS=on -DLIBOMP_ARGOBOTS_INSTALL_DIR=/home/fx/lib/argobots/ | tee c.txt
  • Build Comand
make V=1 VERBOSE=1 2>&1 | tee m.txt
  • Installed libs
    image

Failure configuring with internal libabt

Trying to build BOLT with internal Argobots on current master results in a CMake error:

$ cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME/opt/bolt-git -DLIBOMP_USE_ARGOBOTS=on
...
CMake Error at /usr/share/cmake-3.10/Modules/ExternalProject.cmake:2474 (message):
  No download info given for 'libabt' and its source directory:

   ~/src/bolt/git/bolt/external/argobots

  is not an existing non-empty directory.  Please specify one of:

   * SOURCE_DIR with an existing non-empty directory
   * DOWNLOAD_COMMAND
   * URL
   * GIT_REPOSITORY
   * SVN_REPOSITORY
   * HG_REPOSITORY
   * CVS_REPOSITORY and CVS_MODULE
Call Stack (most recent call first):
  /usr/share/cmake-3.10/Modules/ExternalProject.cmake:3029 (_ep_add_download_command)
  external/CMakeLists.txt:32 (ExternalProject_Add)

Add `#include <stdlib.h>` to examples

I get this warning when I try to compile any of the examples.

 warning: incompatible implicit declaration of built-in function 'malloc'
     double *a = (double *)malloc(sizeof(double)*num);

I was able to fix it by adding #include <stdlib.h> to the examples

To-do list for BOLT 1.0 release

  • Upgrade the baseline LLVM/OpenMP to version 10.0
  • Upgrade the embedded Argobots to version 1.0
  • Support non-x86/64 architectures (including POWER (#48, need further check) and 64-bit ARM)
  • Fix reported issues (#46, #49, #51), which will be covered by #69.

Desire to build bolt with different install directory for omp.h

This simple patch allows the builder of bolt to put omp.h in a different directory. I am working with a more current LLVM omp.h than what bolt builds. With this patch, I can install bolt into the LLVM installation and not clobber omp.h.

git diff
diff --git a/runtime/src/CMakeLists.txt b/runtime/src/CMakeLists.txt
index 042a5200..8a237805 100644
--- a/runtime/src/CMakeLists.txt
+++ b/runtime/src/CMakeLists.txt
@@ -329,7 +329,9 @@ add_dependencies(bolt-libomp-micro-tests bolt-libomp-test-deps)
 # We want to install libomp in DESTDIR/CMAKE_INSTALL_PREFIX/lib
 # We want to install headers in DESTDIR/CMAKE_INSTALL_PREFIX/include
 if(${OPENMP_STANDALONE_BUILD})
-  set(LIBOMP_HEADERS_INSTALL_PATH include)
+  if(NOT LIBOMP_HEADERS_INSTALL_PATH)
+      set(LIBOMP_HEADERS_INSTALL_PATH include)
+  endif()
 else()
   string(REGEX MATCH "[0-9]+\\.[0-9]+(\\.[0-9]+)?" CLANG_VERSION ${PACKAGE_VERSION})
   set(LIBOMP_HEADERS_INSTALL_PATH "${OPENMP_INSTALL_LIBDIR}/clang/${CLANG_VERSION}/include")

FYI, Below are the cmake options I have in our build_bolt.sh script. The above patch allows me to not clobber omp.h. I also turn off the aliases because I want bolt to be user selectable with -fopenmp=libbolt. I have a simple LLVM patch that I want to push upstream to support -fopenmp-libbolt. This does not push the sources for bolt , it simply allows the option to be used when bolt is built as an external component.

MYCMAKEOPTS="
-DCMAKE_INSTALL_PREFIX=$BOLT_INSTALL_DIR
$AOMP_ORIGIN_RPATH
-DCMAKE_C_COMPILER=$AOMP_CC_COMPILER
-DCMAKE_CXX_COMPILER=$AOMP_CXX_COMPILER
-DOPENMP_TEST_C_COMPILER=$AOMP_CC_COMPILER
-DOPENMP_TEST_CXX_COMPILER=$AOMP_CXX_COMPILER
-DCMAKE_BUILD_TYPE=Release
-DOPENMP_ENABLE_LIBOMPTARGET=OFF
-DLIBOMP_HEADERS_INSTALL_PATH=include/bolt
-DLIBOMP_INSTALL_ALIASES=OFF
-DLIBOMP_USE_ARGOBOTS=on"

BOLT tasks and ABT_cond

I am trying to leverage low-level Argobots features inside BOLT tasks (BOLT 1.0rc3, built with internal Argobots). In particular, I would like to block a set of tasks on a conditional variable and unblock them eventually from a different task, like in this example:

#include <abt.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  int n = 10;
#pragma omp parallel
{
#pragma omp master
{
  int blocked = 0;
  ABT_mutex mtx;
  ABT_cond cond;
  ABT_mutex_create(&mtx);
  ABT_cond_create(&cond);

  for (int i = 0; i < n; ++i) {
    printf("Discovering task %d\n", i);
  #pragma omp task shared(mtx, cond, blocked)
  {
    printf("Task %d blocking\n", i);
    ABT_mutex_lock(mtx);
    blocked++;
    ABT_cond_wait(cond, mtx);
    ABT_mutex_unlock(mtx);
  }
  }

  #pragma omp task shared(cond, mtx, blocked)
  {
    printf("Broadcast task starting\n");
    while (n != blocked) {
      ABT_thread_yield();
    }
    // mutex required to ensure all tasks entered cond
    ABT_mutex_lock(mtx);
    printf("Broadcast task broadcasting\n");
    ABT_cond_broadcast(cond);
    ABT_mutex_unlock(mtx);
  }

  #pragma omp taskwait
}
}
  return 0;
}

What I see is that all tasks are created and only the first task starts executing. Output:

$ ./test_bolt_abt_cond
Discovering task 0
Discovering task 1
Discovering task 2
Discovering task 3
Discovering task 4
Discovering task 5
Discovering task 6
Discovering task 7
Discovering task 8
Discovering task 9
Task 0 blocking

Any idea why only the first task is executing? Are the other runnable tasks not passed to Argobots? Do I need to set some environment variables to make this work?

Disable ittnotify by default in CMake

When building BOLT with Argobots, one needs to set -DLIBOMP_USE_ITT_NOTIFY=off manually, which is tedious. CMake script should handle it (i.e., turn it off by default) when Argobots is specified as a threading layer.

thread id inside omp task code management

Now, the thread information for each omp task is taken from blocked ULTs. It works from common task patterns but not for nested task codes that start with single or master clauses (like Fibonacci).

The problem is that we are doubling the thread structures including the task queue so when a new omp task is created it can be done in the real threads queue or in the "fake" one. Then, when we try to join the task... we can not be sure where it is.

However, this issue is related to the OpenMP committee decision about if two omp task can be concurrently executed sharing the thread id

Untied tasks trigger assert

Compiling the following code snippet with clang 9 and running it with BOLT (1.0rc3 downloaded from website, built with Argobots support) leads to an assertion:

int main(int argc, char **argv)
{
#pragma omp parallel
#pragma omp master
{
  #pragma omp task untied
  { }
}
  return 0;
}

The assertion and backtrace:

[New Thread 0x7ffff6d1f700 (LWP 19430)]
[New Thread 0x7ffff651e700 (LWP 19431)]
[New Thread 0x7ffff5d1d700 (LWP 19432)]
Assertion failure at z_Linux_util.cpp(3952): taskdata->td_flags.complete == 0.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(3952).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.

Thread 3 "test_omp_nested" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff651e700 (LWP 19431)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff751f801 in __GI_abort () at abort.c:79
#2  0x00007ffff7b3186a in __kmp_abort_process () at bolt-1.0rc3/runtime/src/kmp_runtime.cpp:463
#3  0x00007ffff7b2f765 in __kmp_fatal (message=...) at bolt-1.0rc3/runtime/src/kmp_i18n.cpp:868
#4  0x00007ffff7b2befd in __kmp_debug_assert (msg=0x7ffff7bb0ada "taskdata->td_flags.complete == 0", file=0x7ffff7bb3c37 "z_Linux_util.cpp", line=3952)
    at bolt-1.0rc3/runtime/src/kmp_debug.cpp:74
#5  0x00007ffff7b8dec2 in __kmp_abt_execute_task (arg=0xb1ee40) at bolt-1.0rc3/runtime/src/z_Linux_util.cpp:3952
#6  0x00007ffff72d1beb in ABTD_thread_func_wrapper_thread (p_arg=0x7fffe6802f00) at ../../src/arch/abtd_thread.c:18
#7  0x00007ffff72d1fb1 in make_fcontext () at ../../src/arch/fcontext/make_x86_64_sysv_elf_gas.S:64
#8  0x00007fffe6802e00 in ?? ()
#9  0x00007ffff7b8de00 in ?? () at bolt-1.0rc3/runtime/src/z_Linux_util.cpp:3441 from ~/opt/bolt-1.0rc3/lib/libomp.so.5
#10 0x0000000000a13f80 in ?? ()
#11 0x00007fffe6fff000 in ?? ()
#12 0x0000000000000000 in ?? ()

I don't see this happening with tied tasks.

Child task management

The child task management design needs to be improved.

The first implementation is done by checking the queued tasks from the current task to the end of the queue and free those tasks so we can assert that children tasks are completed before the parent one continues.

The future idea is to manage children tasks inside the task structure so each task will be able to check them directly without involving the thread structure

Testsuite with Fortran and Intel compilers

Build information:

Testsuite run parameters:

  • TEST_FC=ifort
  • TEST_FFLAGS="-g -O2 -qopenmp -I/~/INSTALL/bolt/include -L/~/INSTALL/bolt/lib -Wl,-rpath=~/INSTALL/bolt/lib"`

make ftest error message:

Testing for "omp_threadprivate":
Generating sources .............. success
Compiling soures ................ success
Running test with 8 threads ../bin/fortran/test_omp_threadprivate: relocation error: ./bin/fortran/test_omp_threadprivate: symbol kmp_aligned_malloc, version VERSION not defined in file libiomp5.so with link time reference
.... failed 100% of the tests

run slow when compare to -fopenmp with gcc11

Hi,

I tried to compare bolt + abt with -fopenmp with gcc11, but found it is about 2x slower.
I build abt and bolt according to the guide, both of them use dynamic so.
I wonder if due to this reason.
Is it possible to build bolt as static lib ? And bolt use abt with static lib ?

The test case is matrix mult from taskflow/benchmarks/matrix_multiplication/,
compile command for link bolt as below:
g++ main.cpp omp.cpp taskflow.cpp tbb.cpp -I~/Work/tbb/include -L~/Work/tbb/build/ -ltbb -I~/Work/taskflow -I~/Work/CLI11 -I~/Work/bolt-omp/include -L~/Work/bolt-omp/lib -lbolt -L~/Work/bolt-abt/lib -labt -o test_bolt -O3
./test_bolt -t 2 -m omp

vs
compile command for use default openmp
g++ main.cpp omp.cpp taskflow.cpp tbb.cpp -I~/Work/tbb/include -L~/Work/tbb/build/ -ltbb -I~/Work/taskflow -I~/Work/CLI11 -fopenmp -o test_omp -O3
./test_omp -t 2 -m omp

Hope for some suggestion to get bolt better performance.
Thanks.

Parallel Compilation Error

When I try to use an embedded Argobots library(external/argobots) and compile BOLT in parallel (e.g., make -j), it causes the following compile-time error:

$ cmake -DLIBOMP_USE_ARGOBOTS=on
...
$ make -j
Performing configure step for 'libabt'
...
bolt/runtime/src/kmp.h:23:17: fatal error: abt.h: No such file or directory
 #include <abt.h>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.