pmodels / bolt Goto Github PK
View Code? Open in Web Editor NEWThis project forked from llvm-mirror/openmp
Official BOLT Repository
Home Page: https://www.bolt-omp.org
License: Other
This project forked from llvm-mirror/openmp
Official BOLT Repository
Home Page: https://www.bolt-omp.org
License: Other
An ordered loop is not optimized well, which causes sometimes a timeout in test/worksharing/for/omp_for_collapse.c
if execution streams are oversubscribed.
#pragma parallel omp for ordered
for (i = 1; i < 10000; i++) {
#pragma omp ordered
comp(i);
}
Although order
is not widely used, it should be optimized by using synchronization primitives.
worksharing/for/omp_for_schedule_guided.c
fails sometimes.
https://jenkins-pmrs.cels.anl.gov/job/bolt-review-centos/78/
This should be fixed.
Environment:
Reproducible code:
bolt_abt_mtx_segf.txt
Error message:
[mutex.c:172] ABT_mutex_lock: 20
Segmentation fault (core dumped)
GDB output:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f64ca2426c2 in ABT_mutex_create ()
from /usr/local/llvm-bolt-argobots/lib/../lib/libabt.so.0
(gdb) where
#0 0x00007f64ca2426c2 in ABT_mutex_create ()
from /usr/local/llvm-bolt-argobots/lib/../lib/libabt.so.0
#1 0x00007f64caa75300 in __kmp_do_serial_initialize ()
at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:139
#2 0x00007f64caa75a5a in __kmp_do_middle_initialize ()
at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:363
#3 0x00007f64caa75ddb in __kmp_middle_initialize ()
at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_runtime.c:470
#4 0x00007f64caa99362 in __kmp_api_omp_get_num_procs ()
at /usr/local/src/llvm/projects/openmp/runtime/src/abt/kmp_abt_ftn_entry.h:444
#5 0x0000000000400b09 in main ()```
This issue was reported in spack/spack#15550
Argobots supports an address sanitizer (pmodels/argobots#218) and seems working with relatively new GCC and Clang (https://jenkins-pmrs.cels.anl.gov/view/abt/job/argobots-review-centos-all/). Since tricky user-level context switching in Argobots passes the tests, BOLT should also pass those address sanitizers. The BOLT CI should use address sanitizers.
This could be my user error. I have a simple code where I am trying to get T2 to execute after T1 because T1 writes x which is input to T2. The code is not setting x to 2 before T2 runs. Here is the code. This fails the same way in both libgomp and libbolt. It fails in clang and gcc 7.5.
#include <stdio.h>
#include <omp.h>
int foo() {
int x = 1;
int x_is_not_equal_two=0;
#pragma omp task depend(in:x) shared(x_is_not_equal_two,x)
{
if (x != 2) {
x_is_not_equal_two = 1;
printf(" T2: INPUT dependend clause x should now be 2 x:%d\n",x);
}
}
#pragma omp task depend(out:x) shared(x)
{
printf(" T1: OUTPUT setting x to 2\n");
x=2;
}
printf("before taskwait x:%d\n",x);
#pragma omp taskwait
printf("after taskwait x:%d x was not equal 2 in T2:%d \n",
x, x_is_not_equal_two);
return x_is_not_equal_two;
}
int main() {
int rc=0;
omp_set_num_threads(2);
#pragma omp parallel
#pragma omp single nowait
rc = foo();
printf("rc:%d\n",rc);
return rc;
}
Hi, I would like to take advantage of lightweight thread libraries, e.g., Argobots, and runtime systems over them, e.g., BOLT. I have successfully configured and built my lib. But I need clarification on the output libs, e.g., libgomp.so, libiomp5.so, libomp.so. Please tell me which one I should use. Do the three libs mean GCC OMP runtime, Intel OMP runtime, and BOLT, respectively?
cmake .. -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/home/fx/lib/bolt -DCMAKE_C_COMPILER=/home/fx/.local/gcc-11.2.0/bin/gcc -DCMAKE_CXX_COMPILER=/home/fx/.local/gcc-11.2.0/bin/g++ -DOPENMP_TEST_C_COMPILER=/home/fx/.local/gcc-11.2.0/bin/gcc -DOPENMP_TEST_CXX_COMPILER=/home/fx/.local/gcc-11.2.0/bin/g++ -DCMAKE_BUILD_TYPE=Release -DLIBOMP_USE_ARGOBOTS=on -DLIBOMP_ARGOBOTS_INSTALL_DIR=/home/fx/lib/argobots/ | tee c.txt
make V=1 VERBOSE=1 2>&1 | tee m.txt
Trying to build BOLT with internal Argobots on current master
results in a CMake error:
$ cmake ../ -DCMAKE_INSTALL_PREFIX=$HOME/opt/bolt-git -DLIBOMP_USE_ARGOBOTS=on
...
CMake Error at /usr/share/cmake-3.10/Modules/ExternalProject.cmake:2474 (message):
No download info given for 'libabt' and its source directory:
~/src/bolt/git/bolt/external/argobots
is not an existing non-empty directory. Please specify one of:
* SOURCE_DIR with an existing non-empty directory
* DOWNLOAD_COMMAND
* URL
* GIT_REPOSITORY
* SVN_REPOSITORY
* HG_REPOSITORY
* CVS_REPOSITORY and CVS_MODULE
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/ExternalProject.cmake:3029 (_ep_add_download_command)
external/CMakeLists.txt:32 (ExternalProject_Add)
I get this warning when I try to compile any of the examples.
warning: incompatible implicit declaration of built-in function 'malloc'
double *a = (double *)malloc(sizeof(double)*num);
I was able to fix it by adding #include <stdlib.h>
to the examples
This simple patch allows the builder of bolt to put omp.h in a different directory. I am working with a more current LLVM omp.h than what bolt builds. With this patch, I can install bolt into the LLVM installation and not clobber omp.h.
git diff
diff --git a/runtime/src/CMakeLists.txt b/runtime/src/CMakeLists.txt
index 042a5200..8a237805 100644
--- a/runtime/src/CMakeLists.txt
+++ b/runtime/src/CMakeLists.txt
@@ -329,7 +329,9 @@ add_dependencies(bolt-libomp-micro-tests bolt-libomp-test-deps)
# We want to install libomp in DESTDIR/CMAKE_INSTALL_PREFIX/lib
# We want to install headers in DESTDIR/CMAKE_INSTALL_PREFIX/include
if(${OPENMP_STANDALONE_BUILD})
- set(LIBOMP_HEADERS_INSTALL_PATH include)
+ if(NOT LIBOMP_HEADERS_INSTALL_PATH)
+ set(LIBOMP_HEADERS_INSTALL_PATH include)
+ endif()
else()
string(REGEX MATCH "[0-9]+\\.[0-9]+(\\.[0-9]+)?" CLANG_VERSION ${PACKAGE_VERSION})
set(LIBOMP_HEADERS_INSTALL_PATH "${OPENMP_INSTALL_LIBDIR}/clang/${CLANG_VERSION}/include")
FYI, Below are the cmake options I have in our build_bolt.sh script. The above patch allows me to not clobber omp.h. I also turn off the aliases because I want bolt to be user selectable with -fopenmp=libbolt. I have a simple LLVM patch that I want to push upstream to support -fopenmp-libbolt. This does not push the sources for bolt , it simply allows the option to be used when bolt is built as an external component.
MYCMAKEOPTS="
-DCMAKE_INSTALL_PREFIX=$BOLT_INSTALL_DIR
$AOMP_ORIGIN_RPATH
-DCMAKE_C_COMPILER=$AOMP_CC_COMPILER
-DCMAKE_CXX_COMPILER=$AOMP_CXX_COMPILER
-DOPENMP_TEST_C_COMPILER=$AOMP_CC_COMPILER
-DOPENMP_TEST_CXX_COMPILER=$AOMP_CXX_COMPILER
-DCMAKE_BUILD_TYPE=Release
-DOPENMP_ENABLE_LIBOMPTARGET=OFF
-DLIBOMP_HEADERS_INSTALL_PATH=include/bolt
-DLIBOMP_INSTALL_ALIASES=OFF
-DLIBOMP_USE_ARGOBOTS=on"
I am trying to leverage low-level Argobots features inside BOLT tasks (BOLT 1.0rc3, built with internal Argobots). In particular, I would like to block a set of tasks on a conditional variable and unblock them eventually from a different task, like in this example:
#include <abt.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int n = 10;
#pragma omp parallel
{
#pragma omp master
{
int blocked = 0;
ABT_mutex mtx;
ABT_cond cond;
ABT_mutex_create(&mtx);
ABT_cond_create(&cond);
for (int i = 0; i < n; ++i) {
printf("Discovering task %d\n", i);
#pragma omp task shared(mtx, cond, blocked)
{
printf("Task %d blocking\n", i);
ABT_mutex_lock(mtx);
blocked++;
ABT_cond_wait(cond, mtx);
ABT_mutex_unlock(mtx);
}
}
#pragma omp task shared(cond, mtx, blocked)
{
printf("Broadcast task starting\n");
while (n != blocked) {
ABT_thread_yield();
}
// mutex required to ensure all tasks entered cond
ABT_mutex_lock(mtx);
printf("Broadcast task broadcasting\n");
ABT_cond_broadcast(cond);
ABT_mutex_unlock(mtx);
}
#pragma omp taskwait
}
}
return 0;
}
What I see is that all tasks are created and only the first task starts executing. Output:
$ ./test_bolt_abt_cond
Discovering task 0
Discovering task 1
Discovering task 2
Discovering task 3
Discovering task 4
Discovering task 5
Discovering task 6
Discovering task 7
Discovering task 8
Discovering task 9
Task 0 blocking
Any idea why only the first task is executing? Are the other runnable tasks not passed to Argobots? Do I need to set some environment variables to make this work?
When building BOLT with Argobots, one needs to set -DLIBOMP_USE_ITT_NOTIFY=off
manually, which is tedious. CMake script should handle it (i.e., turn it off by default) when Argobots is specified as a threading layer.
Now, the thread information for each omp task is taken from blocked ULTs. It works from common task patterns but not for nested task codes that start with single or master clauses (like Fibonacci).
The problem is that we are doubling the thread structures including the task queue so when a new omp task is created it can be done in the real threads queue or in the "fake" one. Then, when we try to join the task... we can not be sure where it is.
However, this issue is related to the OpenMP committee decision about if two omp task can be concurrently executed sharing the thread id
Compiling the following code snippet with clang 9 and running it with BOLT (1.0rc3 downloaded from website, built with Argobots support) leads to an assertion:
int main(int argc, char **argv)
{
#pragma omp parallel
#pragma omp master
{
#pragma omp task untied
{ }
}
return 0;
}
The assertion and backtrace:
[New Thread 0x7ffff6d1f700 (LWP 19430)]
[New Thread 0x7ffff651e700 (LWP 19431)]
[New Thread 0x7ffff5d1d700 (LWP 19432)]
Assertion failure at z_Linux_util.cpp(3952): taskdata->td_flags.complete == 0.
OMP: Error #13: Assertion failure at z_Linux_util.cpp(3952).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Thread 3 "test_omp_nested" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff651e700 (LWP 19431)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff751f801 in __GI_abort () at abort.c:79
#2 0x00007ffff7b3186a in __kmp_abort_process () at bolt-1.0rc3/runtime/src/kmp_runtime.cpp:463
#3 0x00007ffff7b2f765 in __kmp_fatal (message=...) at bolt-1.0rc3/runtime/src/kmp_i18n.cpp:868
#4 0x00007ffff7b2befd in __kmp_debug_assert (msg=0x7ffff7bb0ada "taskdata->td_flags.complete == 0", file=0x7ffff7bb3c37 "z_Linux_util.cpp", line=3952)
at bolt-1.0rc3/runtime/src/kmp_debug.cpp:74
#5 0x00007ffff7b8dec2 in __kmp_abt_execute_task (arg=0xb1ee40) at bolt-1.0rc3/runtime/src/z_Linux_util.cpp:3952
#6 0x00007ffff72d1beb in ABTD_thread_func_wrapper_thread (p_arg=0x7fffe6802f00) at ../../src/arch/abtd_thread.c:18
#7 0x00007ffff72d1fb1 in make_fcontext () at ../../src/arch/fcontext/make_x86_64_sysv_elf_gas.S:64
#8 0x00007fffe6802e00 in ?? ()
#9 0x00007ffff7b8de00 in ?? () at bolt-1.0rc3/runtime/src/z_Linux_util.cpp:3441 from ~/opt/bolt-1.0rc3/lib/libomp.so.5
#10 0x0000000000a13f80 in ?? ()
#11 0x00007fffe6fff000 in ?? ()
#12 0x0000000000000000 in ?? ()
I don't see this happening with tied tasks.
The child task management design needs to be improved.
The first implementation is done by checking the queued tasks from the current task to the end of the queue and free those tasks so we can assert that children tasks are completed before the parent one continues.
The future idea is to manage children tasks inside the task structure so each task will be able to check them directly without involving the thread structure
Build information:
Testsuite run parameters:
TEST_FC=ifort
TEST_FFLAGS="-g -O2 -qopenmp -I/~/INSTALL/bolt/include -L/~/INSTALL/bolt/lib -Wl,-rpath=~/INSTALL/bolt/lib
"`make ftest
error message:
Testing for "omp_threadprivate":
Generating sources .............. success
Compiling soures ................ success
Running test with 8 threads ../bin/fortran/test_omp_threadprivate: relocation error: ./bin/fortran/test_omp_threadprivate: symbol kmp_aligned_malloc, version VERSION not defined in file libiomp5.so with link time reference
.... failed 100% of the tests
Hi,
I tried to compare bolt + abt with -fopenmp with gcc11, but found it is about 2x slower.
I build abt and bolt according to the guide, both of them use dynamic so.
I wonder if due to this reason.
Is it possible to build bolt as static lib ? And bolt use abt with static lib ?
The test case is matrix mult from taskflow/benchmarks/matrix_multiplication/,
compile command for link bolt as below:
g++ main.cpp omp.cpp taskflow.cpp tbb.cpp -I~/Work/tbb/include -L~/Work/tbb/build/ -ltbb -I~/Work/taskflow -I~/Work/CLI11 -I~/Work/bolt-omp/include -L~/Work/bolt-omp/lib -lbolt -L~/Work/bolt-abt/lib -labt -o test_bolt -O3
./test_bolt -t 2 -m omp
vs
compile command for use default openmp
g++ main.cpp omp.cpp taskflow.cpp tbb.cpp -I~/Work/tbb/include -L~/Work/tbb/build/ -ltbb -I~/Work/taskflow -I~/Work/CLI11 -fopenmp -o test_omp -O3
./test_omp -t 2 -m omp
Hope for some suggestion to get bolt better performance.
Thanks.
When I try to use an embedded Argobots library(external/argobots
) and compile BOLT in parallel (e.g., make -j
), it causes the following compile-time error:
$ cmake -DLIBOMP_USE_ARGOBOTS=on
...
$ make -j
Performing configure step for 'libabt'
...
bolt/runtime/src/kmp.h:23:17: fatal error: abt.h: No such file or directory
#include <abt.h>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.