Code Monkey home page Code Monkey logo

legion's Introduction

Legion

Legion is a parallel programming model for distributed, heterogeneous machines.

Branches

The Legion team uses this repository for active development, so please make sure you're using the right branch for your needs:

  • stable - This is the default branch if you clone the repository. It is generally about a month behind the master branch, allowing us to get some mileage on larger changes before foisting them on everybody. Most users of Legion should use this branch, although you should be prepared to try the master branch if you run into issues. Updates are moved to the stable branch roughly monthly, although important bug fixes may be applied directly when needed. Each batch of updates is given a "version" number, and CHANGES.txt lists the major changes.
  • master - This is the "mainline" used by the Legion team, and contains changes and bug fixes that may not have made it into the stable branch yet. If you are a user of "bleeding-edge" Legion functionality, you will probably need to be using this branch.
  • lots of other feature branches - These exist as necessary for larger changes, and users will generally want to steer clear of them. :)

Overview

Legion is a programming model and runtime system designed for decoupling the specification of parallel algorithms from their mapping onto distributed heterogeneous architectures. Since running on the target class of machines requires distributing not just computation but data as well, Legion presents the abstraction of logical regions for describing the structure of program data in a machine independent way. Programmers specify the partitioning of logical regions into subregions, which provides a mechanism for communicating both the independence and locality of program data to the programming system. Since the programming system has knowledge of both the structure of tasks and data within the program, it can aid the programmer in host of problems that are commonly the burden of the programmer:

  • Discovering/verifying correctness of parallel execution: determining when two tasks can be run in parallel without a data race is often difficult. Legion provides mechanisms for creating both implicit and explicit parallel task launches. For implicit constructs Legion will automatically discover parallelism. For explicit constructs, Legion will notify the programmer if there are potential data races between tasks intended to be run in parallel.
  • Managing communication: when Legion determines that there are data dependencies between two tasks run in different locations, Legion will automatically insert the necessary copies and apply the necessary constraints so the second task will not run until its data is available. We describe how tasks and data are placed in the next paragraph on mapping Legion programs.

The Legion programming model is designed to abstract computations in a way that makes them portable across many different potential architectures. The challenge then is to make it easy to map the abstracted computation of the program onto actual architectures. At a high level, mapping a Legion program entails making two kinds of decisions:

  1. For each task: select a processor on which to run the task.
  2. For each logical region a task needs: select a memory in which to create a physical instance of the logical region for the task to use.

To facilitate this process Legion introduces a novel runtime 'mapping' interface. One of the NON-goals of the Legion project was to design a programming system that was magically capable of making intelligent mapping decisions. Instead the mapping interface provides a declarative mechanism for the programmer to communicate mapping decisions to the runtime system without having to actually write any code to perform the mapping (e.g. actually writing the code to perform a copy or synchronization). Furthermore, by making the mapping interface dynamic, it allows the programmer to make mapping decisions based on information that may only be available at runtime. This includes decisions based on:

  • Program data: some computations are dependent on data (e.g. is our irregular graph sparse or dense in the number of edges).
  • System data: which processors or nodes are currently up or down, or which are running fast or slow to conserve power.
  • Execution data: profiling data that is fed back to the mapper about how a certain mapping performed previously. Alternatively which processors are currently over- or under- loaded.

All of this information is made available to the mapper via various mapper calls, some of which query the mapping interface while others simply are communicating information to the mapper.

One very important property of the mapping interface is that no mapping decisions are capable of impacting the correctness of the program. Consequently, all mapping decisions made are only performance decisions. Programmers can then easily tune a Legion application by modifying the mapping interface implementation without needing to be concerned with how their decisions impact correctness. Ultimately, this makes it possible in Legion to explore whole spaces of mapping choices (which tasks run on CPUs or GPUs, or where data gets placed in the memory hierarchy) simply by enumerating all the possible mapping decisions and trying them.

To make it easy to get a working program, Legion provides a default mapper implementation that uses heuristics to make mapping decisions. In general these decision are good, but they are certain to be sub-optimal across all applications and architectures. All calls in the mapping interface are C++ virtual functions that can be overridden, so programmers can extend the default mapper and only override the mapping calls that are impacting performance. Alternatively a program can implement the mapping interface entirely from scratch.

For more details on the Legion programming model and its current implementation we refer to you to our Supercomputing paper.

http://theory.stanford.edu/~aiken/publications/papers/sc12.pdf

Contents

This repository includes the following contents:

  • tutorial: Source code for the tutorials.
  • examples: Larger examples for advanced programming techniques.
  • apps: Several complete Legion applications.
  • language: The Regent programming language compiler and examples.
  • runtime: The core runtime components:
    • legion: The Legion runtime itself (see legion.h).
    • realm: The Realm low-level runtime (see realm.h).
    • mappers: Several mappers, including the default mapper (see default_mapper.h).
  • tools: Miscellaneous tools:

Dependencies

To get started with Legion, you'll need:

  • Linux, macOS, or another Unix
  • A C++ 17 (or newer) compiler (GCC, Clang, Intel, or PGI) and GNU Make
  • Optional: CMake 3.16 or newer
  • Optional: Python 3.5 or newer (used for tools and Python bindings)
    • Note: Python 3.8 or newer is required for tools/legion_prof.py
  • Optional: Rust 1.74 or newer (used for Rust profiler)
  • Optional: CUDA 10.0 or newer (for NVIDIA GPUs)
  • Optional: CUDA 11.7 or newer if using Legion's built-in complex reduction operators
  • Optional: GASNet (for networking, see installation instructions)
  • Optional: LLVM 7-14 (for dynamic code generation)
  • Optional: HDF5 (for file I/O)

Installing

Legion is currently compiled with each application. To try a Legion application, just call make in the directory in question. The LG_RT_DIR variable is used to locate the Legion runtime directory. For example:

git clone https://github.com/StanfordLegion/legion.git
export LG_RT_DIR="$PWD/legion/runtime"
cd legion/examples/circuit
make
./circuit

Makefile Variables

The Legion Makefile includes several variables which influence the build. These may either be set in the environment (e.g. DEBUG=0 make) or at the top of each application's Makefile.

  • DEBUG=<0,1>: controls optimization level and enables various dynamic checks which are too expensive for release builds.
  • OUTPUT_LEVEL=<level_name>: controls the compile-time logging level.
  • USE_CUDA=<0,1>: enables CUDA support.
  • USE_GASNET=<0,1>: enables GASNet support (see installation instructions).
  • USE_LLVM=<0,1>: enables LLVM support.
  • USE_HDF=<0,1>: enables HDF5 support.

Build Flags

In addition to Makefile variables, compilation is influenced by a number of build flags. These flags may be added to variables in the environment (or again set inside the Makefile).

Command-Line Flags

Legion and Realm accept command-line arguments for various runtime parameters. Below are some of the more commonly used flags:

  • -level <category>=<int>: sets logging level for category
  • -logfile <filename>: directs logging output to filename
  • -ll:cpu <int>: CPU processors to create per process
  • -ll:gpu <int>: GPU processors to create per process
  • -ll:util <int>: utility processors to create per process
  • -ll:csize <int>: size of CPU DRAM memory per process (in MB)
  • -ll:gsize <int>: size of GASNET global memory available per process (in MB)
  • -ll:rsize <int>: size of GASNET registered RDMA memory available per process (in MB)
  • -ll:fsize <int>: size of framebuffer memory for each GPU (in MB)
  • -ll:zsize <int>: size of zero-copy memory for each GPU (in MB)
  • -lg:window <int>: maximum number of tasks that can be created in a parent task window
  • -lg:sched <int>: minimum number of tasks to try to schedule for each invocation of the scheduler

The default mapper also has several flags for controlling the default mapping. See default_mapper.cc for more details.

Developing Programs

To start a new Legion application, make a new directory and copy apps/Makefile.template into your directory under the name Makefile. Fill in the appropriate fields at the top of the Makefile with the filenames needed for your application.

Most Legion APIs are described in legion.h; a smaller number are described in the various header files in the runtime/realm directory. The default mapper is available in default_mapper.h.

Debugging

Legion has a number of tools to aid in debugging programs.

Extended Correctness Checks

Compile with DEBUG=1 PRIVILEGE_CHECKS=1 BOUNDS_CHECKS=1" make and rerun the application. This enables dynamic checks for privilege and out-of-bounds errors in the application. (These checks are not enabled by default because they are relatively expensive.) If the application runs without terminating with an error, then continue on to Legion Spy.

Legion Spy

Legion provides a task-level visualization tool called Legion Spy. This captures the logical and physical dependence graphs. These may help, for example, as a sanity check to ensure that the correct sequence of tasks is being launched (and the tasks have the correct dependencies). Legion Spy also has a self-checking mode which can validate the correctness of the runtime's logical and physical dependence algorithms.

To capture a trace, invoke the application with -lg:spy -logfile spy_%.log. (No special compile-time flags are required.) This will produce a log file per node. Call the post-processing script to render PDF files of the dependence graphs:

./app -lg:spy -logfile spy_%.log
$LG_RT_DIR/../tools/legion_spy.py -dez spy_*.log

To run Legion Spy's self-checking mode, Legion must be built with the flag USE_SPY=1. Following this, the application can be run again, and the script used to validate (or render) the trace.

DEBUG=1 USE_SPY=1 make
./app -lg:spy -logfile spy_%.log
$LG_RT_DIR/../tools/legion_spy.py -lpa spy_*.log
$LG_RT_DIR/../tools/legion_spy.py -dez spy_*.log

Profiling

Legion contains a task-level profiler. No special compile-time flags are required. However, it is recommended to build with DEBUG=0 make to avoid any undesired performance issues.

To profile an application, run with -lg:prof <N> where N is the number of nodes to be profiled. (N can be less than the total number of nodes---this profiles a subset of the nodes.) Use the -lg:prof_logfile <logfile> flag to save the output from each node to a separate file. The argument to the -lg:prof_logfile flag follows the same format as for -logfile, except that a % (to be replaced by the node number) is mandatory. Finally, pass the resulting log files to legion_prof.py.

DEBUG=0 make
./app -lg:prof <N> -lg:prof_logfile prof_%.gz
$LG_RT_DIR/../tools/legion_prof.py prof_*.gz

This will generate a subdirectory called legion_prof under the current directory, including a file named index.html. Open this file in a browser.

Other Features

  • Inorder Execution: Users can force the high-level runtime to execute all tasks in program order by passing -lg:inorder flag on the command-line.

  • Dynamic Independence Tests: Users can request the high-level runtime perform dynamic independence tests between regions and partitions by passing -lg:dynamic flag on the command-line.

legion's People

Contributors

apryakhin avatar aritpaul avatar artempriakhin avatar dotnwat avatar eddy16112 avatar elliottslaughter avatar gshipman avatar gsjaardema avatar hzhou avatar jacobfaib avatar jiazhihao avatar junghans avatar lightsighter avatar magnatelee avatar manopapad avatar marcinz avatar ndrewtl avatar rainmakereuab avatar rajshah11 avatar reazulhoque avatar rohany avatar rupanshusoi avatar seemamirch avatar seyedmir avatar sgurfinkel avatar streichler avatar syamajala avatar trxcllnt avatar zard49 avatar zhangwen0411 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

legion's Issues

CUDA Thrust cannot access Device

CUDA Thrust calls such as the following:

thrust::exclusive_scan(thrust::device_ptr<uint2>(input), thrust::device_ptr<uint2>(input + numElements), thrust::device_ptr<uint2>(output), zero);

give the error (generated with -DREALM_BACKTRACE):

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  function_attributes(): after cudaFuncGetAttributes: invalid device function
BACKTRACE (0, 7f3c38567700)
----------
./composite() [0xb21c75]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x36d40
  /lib/x86_64-linux-gnu/libc.so.6 : gsignal()+0x39
  /lib/x86_64-linux-gnu/libc.so.6 : abort()+0x148
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : __gnu_cxx::__verbose_terminate_handler()+0x155
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5e836
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5e863
  /usr/lib/x86_64-linux-gnu/libstdc++.so.6 : ()+0x5eaa2
  ./composite : thrust::system::cuda::detail::bulk_::detail::throw_on_error(cudaError, char const*)+0x50
  ./composite : thrust::system::cuda::detail::bulk_::detail::function_attributes_t thrust::system::cuda::detail::bulk_::detail::function_attributes<void (*)(thrust::system::cuda::detail::bulk_::detail::cuda_task<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<9ul>, 128ul>, 0ul>, thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::accumulate_tiles, thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1u>, thrust::device_ptr<uint2>, thrust::system::cuda::detail::aligned_decomposition<long>, thrust::detail::normal_iterator<thrust::pointer<uint2, thrust::system::cuda::detail::tag, thrust::use_default, thrust::use_default> >, thrust::plus<uint2>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> > >)>(void (*)(thrust::system::cuda::detail::bulk_::detail::cuda_task<thrust::system::cuda::detail::bulk_::parallel_group<thrust::system::cuda::detail::bulk_::concurrent_group<thrust::system::cuda::detail::bulk_::agent<9ul>, 128ul>, 0ul>, thrust::system::cuda::detail::bulk_::detail::closure<thrust::system::cuda::detail::scan_detail::accumulate_tiles, thrust::tuple<thrust::system::cuda::detail::bulk_::detail::cursor<1u>, thrust::device_ptr<uint2>, thrust::system::cuda::detail::aligned_decomposition<long>, thrust::detail::normal_iterator<thrust::pointer<uint2, thrust::system::cuda::detail::tag, thrust
----------
BACKTRACE (0, 7f3d429f9700)
----------
./composite() [0xb21c75]
  /lib/x86_64-linux-gnu/libc.so.6 : ()+0x36d40
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : g_main_context_check()+0x134
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : ()+0x48f7b
  /lib/x86_64-linux-gnu/libglib-2.0.so.0 : g_main_context_iteration()+0x2c
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)+0xc3
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>)+0xcb
  /usr/local/Qt/5.4/gcc_64/lib/libQt5Core.so.5 : QCoreApplication::exec()+0x85
  /home/xin/gitlab/legioncomposite/QtViewer/libQtViewer.so.1 : interactThread(void*)+0x6d
  /lib/x86_64-linux-gnu/libpthread.so.0 : ()+0x8182
  /lib/x86_64-linux-gnu/libc.so.6 : clone()+0x6d

----------

A simple test of this failure is:

thrust::host_vector<int> H(4);
H[0] = 14;
H[1] = 20;
H[2] = 38;
H[3] = 46;
thrust::device_vector<int> D = H;
thrust::inclusive_scan(D.begin(), D.end(), D.begin());

Legion Prof: Regression - Missing Profiler Output

Looks like we are missing profiler output. The profiling preamble containing metadata gets printed, the application executes and terminates normally, but no profiling info is actually provided. Tested with cgsolver example on master.

./cgsolver -hl:prof 1 -cat legion_prof -level 2

User-level querying of currently mapped (and previously mapped) regions

Suppose you want to create a Legion library that requires access to a region passed as an argument. You have two ways to do this: either (1) you could create and launch a task, or (2) you could inline map the region. For the purposes of this discussion, let's assume you want to follow approach (2).

Currently, this won't work, because the region might already be mapped in the context in which the library is running. Or maybe it was previously mapped, and just happens to be currently unmapped. My understanding is that either of these scenarios will result in erroneous or poorly-defined behaviour if the library attempts to map the region a second time.

One way to deal with this might be for the user to query the state of the regions currently (or previously) mapped in this context. It's worth noting that this is in fact what the runtime would do (on behalf of the user) in scenario (1): the runtime has the ability to introspect the context and discover what regions need to be unmapped (and subsequently re-mapped) for any given operation. But there is currently no equivalent user-visible mechanism, which means that approach (2) is effectively infeasible for Legion libraries today.

CUDA init code fails to map sysmem with -ll:zsize 0

A normal run of a Legion app with a gpu enabled (e.g. full_ghost -ll:gpu 1 -ll:cpu 4 -level gpu=2) results in the following (good) log message:

[0 - 7fa3aaff97c0] {2}{gpu}: memory 60000000 successfully registered with GPU 80000005

However, adding -ll:zsize 0 to the command line gives this instead:

[0 - 7f2de05427c0] {4}{gpu}: GPU #0 has no mapping for registered memory (0x7f2da0dff100) !?
[0 - 7f2de05427c0] {4}{gpu}: GPU 80000005 has no pinned system memories!?

which results in badness later on because there are no copy paths between sysmem and FB. These are two separate sections of code, but both are related to host memory registration, so there may be some CUDA driver weirdness here.

missing legion_trace.h

The commit 8c88291 appears to have not included legion_trace.h.

make[1]: Entering directory `/home/nwatkins/legion/examples/00_hello_world'
g++ -o ../../runtime/legion_ops.o -c ../../runtime/legion_ops.cc  -I../../runtime  -DDEBUG_LOW_LEVEL -DDEBUG_HIGH_LEVEL -ggdb  -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG   -DSHARED_LOWLEVEL
../../runtime/legion_ops.cc:22:26: fatal error: legion_trace.h: No such file or directory
compilation terminated.

segment fault in activemsg.cc

I have merged the most up-to-date master branch into dma branch. While I was running dma_random test, I experienced a reproducible segment fault in activemsg.cc. I believe I have never seen this failure before. The backtrace of the failed thread is attached.

Following steps reproduce the failure scenario:

  1. compile test/dma_random under dma branch
  2. GASNET_BACKTRACE=1 mpirun -n 2 -H n0000,n0001 --bind-to none dma_random -ll:dsize 1024 -ll:rsize 1024 -ll:gsize 1024 -ll:ahandlers 3

[0] Thread 9 (Thread 0x7f29b5911700 (LWP 7537)):
[0] #0 0x00007f2a645eb619 in __libc_waitpid (pid=7546, stat_loc=stat_loc@entry=0x7f29b590e210, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0] #1 0x00007f2a645701d2 in do_system (line=) at ../sysdeps/posix/system.c:148
[0] #2 0x0000000000d56b28 in gasneti_bt_gdb ()
[0] #3 0x0000000000d5998b in gasneti_print_backtrace ()
[0] #4 0x0000000000db24c1 in gasneti_defaultSignalHandler ()
[0] #5
[0] #6 0x00007f2a63e70471 in opal_memory_ptmalloc2_int_free () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #7 0x00007f2a63e70bb3 in opal_memory_ptmalloc2_free () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #8 0x00000000009e050c in ContiguousPayload::copy_data (this=0x2243720, dest=0x7f2a5a517f40) at ../../runtime//activemsg.cc:2305
[0] #9 0x00000000009df328 in OutgoingMessage::reserve_srcdata (this=0x225a940) at ../../runtime//activemsg.cc:1735
[0] #10 0x00000000009e1b61 in ActiveMessageEndpoint::enqueue_message (this=0x2239a40, hdr=0x225a940, in_order=true) at ../../runtime//activemsg.cc:1080
[0] #11 0x00000000009e57da in EndpointManager::enqueue_message (this=0x2239660, target=1, hdr=0x225a940, in_order=true) at ../../runtime//activemsg.cc:1887
[0] #12 0x00000000009e016f in enqueue_message (target=1, msgid=171, args=0x7f29b5910bf0, arg_size=24, payload=0x225b6a0, payload_size=172, payload_mode=2, dstptr=0x0) at ../../runtime//activemsg.cc:2220
[0] #13 0x0000000000a47ff6 in ActiveMessageMediumNoReply<171, Realm::MetadataResponseMessage::RequestArgs, &Realm::MetadataResponseMessage::handle_request>::request (dest=1, args=..., data=0x225b6a0, datalen=172, payload_mode=2, dstptr=0x0) at ../../runtime/activemsg.h:607
[0] #14 0x0000000000a47876 in Realm::MetadataResponseMessage::send_request (target=1, id=3766419465, data=0x225b6a0, datalen=172, payload_mode=2) at ../../runtime//realm/metadata.cc:297
[0] #15 0x0000000000a4772f in Realm::MetadataRequestMessage::handle_request (args=...) at ../../runtime//realm/metadata.cc:248
[0] #16 0x0000000000a8d93c in IncomingShortMessage<Realm::MetadataRequestMessage::RequestArgs, 170, &Realm::MetadataRequestMessage::handle_request, 2>::run_handler (this=0x7f29b520c760) at ../../runtime/activemsg.h:343
[0] #17 0x00000000009def6f in IncomingMessageManager::handler_thread_loop (this=0x223a8a0) at ../../runtime//activemsg.cc:770
[0] #18 0x00000000009e778e in Realm::Thread::thread_entry_wrapper<IncomingMessageManager, &IncomingMessageManager::handler_thread_loop> (obj=0x223a8a0) at ../../runtime//realm/threads.inl:127
[0] #19 0x0000000000a227b7 in Realm::KernelThread::pthread_entry (data=0x223ab00) at ../../runtime//realm/threads.cc:555
[0] #20 0x00007f2a652fa182 in start_thread (arg=0x7f29b5911700) at pthread_create.c:312
[0] #21 0x00007f2a64624fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

invalid id type found in get_event_impl

In latest dma branch I see the periodic failure:

Assertion failed: (0), function get_event_impl, file /Users/nwatkins/src/legion/runtime/realm/runtime_impl.cc, line 1256

Running

Running `GASNET_BACKTRACE=1  GASNET_MASTERIP='127.0.0.1' GASNET_SPAWN=-L SSH_SERVERS="localhost localhost" amudprun -np 2 ./tester_io

in test/hdf_attach_subregion_parallel

Stack trace is so far evading capture. Will update when possible.

circuit example does not compile

Attempting to compile full_circuit example using stable branch fails. Compiles fine on master.

g++ -o circuit.o -c circuit.cc  -I/home/sdalton/legion/runtime -I/home/sdalton/legion/runtime/realm  -march=native -DDEBUG_LOW_LEVEL -DDEBUG_HIGH_LEVEL -ggdb  -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG         -Wall -Wno-strict-overflow -Werror -DSHARED_LOWLEVEL
circuit.cc: In function โ€˜void top_level_task(const LegionRuntime::HighLevel::Task*, const std::vector<LegionRuntime::HighLevel::PhysicalRegion>&, LegionRuntime::HighLevel::Context, LegionRuntime::HighLevel::HighLevelRuntime*)โ€™:
circuit.cc:135:29: error: โ€˜LegionRuntime::TimeStampโ€™ has not been declared
   ts_start = LegionRuntime::TimeStamp::get_current_time_in_micros();
                             ^
circuit.cc:148:27: error: โ€˜LegionRuntime::TimeStampโ€™ has not been declared
   ts_end = LegionRuntime::TimeStamp::get_current_time_in_micros();
                           ^
make: *** [circuit.o] Error 1

However the dependent_circuit example fails with similar error on master.

Resource exhaustion in S3D

With stencils enabled, something is going wrong with reuse and/or collection of
unused instances and S3D runs out of FB memory after running for a while.
(In the case of PRF with 48^3 per node, "a while" is "3 time steps.") The bug was
introduced some time after 7b73e22, and
is suspected to be related to the changes to the handling of restricted instances.

Spurious "uninitialized data" warnings from HLR

Running the full_ghost example with either LLR (you need -ll:cpu 4 for Realm) results in the following:

[0 - 7fda208a5700] {4}{runtime}: WARNING: Region requirement 2 of operation spmd (UID 4) is using uninitialized data for field(s) 2 of logical region (13,1,8)
[0 - 7fda208a5700] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 4) is using uninitialized data for field(s) 2 of logical region (8,1,3)
[0 - 7fda208a5700] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 5) is using uninitialized data for field(s) 2 of logical region (10,1,5)
[0 - 7fda208a5700] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 6) is using uninitialized data for field(s) 2 of logical region (12,1,7)

These warnings are being produced for region requirements that are using READ_ONLY privileges but SIMULTANEOUS coherence, so the initializer of the data is one of the other concurrent tasks. This is pretty much the only reason why you'd ask for this combination, so I suggest we eliminate the warning message in this case.

dma: unimplemented hdf-to-global transfer

Seeing this in some situations:

To be implemented: hdf memory -> global memory
tester_io: /home/nwatkins/src/legion/runtime//lowlevel_dma.cc:2990: void LegionRuntime::LowLevel::CopyRequest::perform_new_dma(LegionRuntime::LowLevel::Memory, LegionRuntime::LowLevel::Memory) [with unsigned int DIM = 2u]: Assertion `0' failed.

release operation isn't launched after read-only task

I pushed a read/write benchmark into attach-file branch. The benchmark is designed to demonstrate that read_only and read_write tasks could run in parallel in Legion runtime by using different local copy of the same logical region. To achieve this, I am expecting that release operation should be executed before read_only task returns and therefore enables others to execute acquire operation and start parallel tasks.

However, it seems that in the benchmark, all tasks are being serialized. My hypothesis is that release operation isn't being performed before read-only tasks return. I am not sure if the mistake happens on application side or Legion runtime side. You should be able to find the benchmark at test/read-write in attach-file branch. Following command line should reproduce everything:

./read_write -n 1024 -t 256 -ll:cpu 8 -ll:csize 2048 -ll:dma 8 -hl:prof 1 -cat legion_prof -level 2 -logfile log.txt

Build error when using 64bit IDs: -DLEGION_IDS_ARE_64BIT on DMA branch

I'm getting a build error on the DMA branch when building with -DLEGION_IDS_ARE_64BIT

make DEBUG_FLAGS="-DTESTERIO_TIMERS -DHDF_LOCKS -DLEGION_IDS_ARE_64BIT"
mpicxx -o tester_io.o -c tester_io.cc -I/opt/local/include/ -I../../runtime/ -I../../runtime//realm -I../../runtime//greenlet -I/users/gshipman/local/include -I/users/gshipman/local/include/ibv-condui
t -g -Wno-array-bounds -DTESTERIO_TIMERS -DHDF_LOCKS -DLEGION_IDS_ARE_64BIT -DUSE_DISK -march=native -DUSE_GASNET -DGASNETI_BUG1389_WORKAROUND=1 -DGASNET_CONDUIT_IBV -DUSE_HDF -DDEBUG_LOW_LEVEL -DDEBU
G_HIGH_LEVEL -ggdb -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG -Wall -Werror -Wno-format -Wno-sign-compare
mpicxx -o ../../runtime//lowlevel_disk.o -c ../../runtime//lowlevel_disk.cc -I/opt/local/include/ -I../../runtime/ -I../../runtime//realm -I../../runtime//greenlet -I/users/gshipman/local/include -I/u
sers/gshipman/local/include/ibv-conduit -g -Wno-array-bounds -DTESTERIO_TIMERS -DHDF_LOCKS -DLEGION_IDS_ARE_64BIT -DUSE_DISK -march=native -DUSE_GASNET -DGASNETI_BUG1389_WORKAROUND=1 -DGASNET_CONDUIT_
IBV -DUSE_HDF -DDEBUG_LOW_LEVEL -DDEBUG_HIGH_LEVEL -ggdb -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG -Wall -Werror -Wno-format -Wno-sign-compare
mpicxx -o ../../runtime//realm/runtime_impl.o -c ../../runtime//realm/runtime_impl.cc -I/opt/local/include/ -I../../runtime/ -I../../runtime//realm -I../../runtime//greenlet -I/users/gshipman/local/in
clude -I/users/gshipman/local/include/ibv-conduit -g -Wno-array-bounds -DTESTERIO_TIMERS -DHDF_LOCKS -DLEGION_IDS_ARE_64BIT -DUSE_DISK -march=native -DUSE_GASNET -DGASNETI_BUG1389_WORKAROUND=1 -DGASNE
T_CONDUIT_IBV -DUSE_HDF -DDEBUG_LOW_LEVEL -DDEBUG_HIGH_LEVEL -ggdb -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG -Wall -Werror -Wno-format -Wno-sign-compare
In file included from ../../runtime//realm/runtime_impl.h:24,
from ../../runtime//realm/runtime_impl.cc:18:
../../runtime/activemsg.h: In static member function โ€˜static int ActiveMessageMediumNoReply<MSGID, MSGTYPE, FNPTR>::add_handler_entries(gasnet_handlerentry_t_, const char_) [with int MSGID = 201, MSGTY
PE = LegionRuntime::LowLevel::RemoteFillArgs, void (* FNPTR)(MSGTYPE, const void_, size_t) = LegionRuntime::LowLevel::handle_remote_fill]โ€™:
../../runtime//realm/runtime_impl.cc:436: instantiated from here
../../runtime/activemsg.h:620: error: invalid application of โ€˜sizeofโ€™ to incomplete type โ€˜MessageRawArgs<LegionRuntime::LowLevel::RemoteFillArgs, 201, dummy_short_handler [with MSGTYPE = LegionRuntime:
:LowLevel::RemoteFillArgs], LegionRuntime::LowLevel::handle_remote_fill, 18>โ€™
../../runtime/activemsg.h:622: error: incomplete type โ€˜MessageRawArgs<LegionRuntime::LowLevel::RemoteFillArgs, 201, dummy_short_handler [with MSGTYPE = LegionRuntime::LowLevel::RemoteFillArgs], LegionR
untime::LowLevel::handle_remote_fill, 18>โ€™ used in nested name specifier
make: *_* [../../runtime//realm/runtime_impl.o] Error 1

dma: hang during use of HDF5 memory

transferring discussion from mailing list:

From the backtrace information, it seems not all enqueued XferDes have been dequeued by DMAThreads before they fall asleep. One potential cause is that we are using std::set to implement PriorityQueue, in which case xfer descriptors with same priority might be merged as one. I have replaced std::set with set::multiset (pushed into dma branch). Could you please rerun the test and see if there is still hang?

Thanks for taking a look at this Zhihao. The hang is much less frequently now (I think I hit one hang in 200 runs), but the hang still seems to be here. I'm running the same hdf_attach_subregion_parallel test in the dma branch. The trace is below.

thread #1: tid = 0x59433, 0x00007fff9299748a libsystem_kernel.dylib`__semwait_signal + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff9299748a libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff95d7cf5d libsystem_c.dylib`nanosleep + 199
    frame #2: 0x00007fff95d7cdbd libsystem_c.dylib`sleep + 42
    frame #3: 0x000000010b6bd9f4 tester_io`LegionRuntime::LowLevel::RuntimeImpl::run(this=0x00007fe6f2505240, task_id=0, style=ONE_TASK_ONLY, args=0x0000000000000000, arglen=0, background=false) + 3924 at lowlevel.
cc:10789
    frame #4: 0x000000010b6bca90 tester_io`Realm::Runtime::run(this=0x00007fff545a4438, task_id=0, style=ONE_TASK_ONLY, args=0x0000000000000000, arglen=0, background=false) + 80 at lowlevel.cc:9807
    frame #5: 0x000000010bd725b5 tester_io`LegionRuntime::HighLevel::Runtime::start(argc=1, argv=0x00007fff545a4d38, background=false) + 7797 at runtime.cc:15687
    frame #6: 0x000000010b8165ea tester_io`LegionRuntime::HighLevel::HighLevelRuntime::start(argc=3, argv=0x00007fff545a4d28, background=false) + 42 at legion.cc:3671
    frame #7: 0x000000010b67d3c0 tester_io`main(argc=3, argv=0x00007fff545a4d28) + 352 at tester_io.cc:244
    frame #8: 0x00007fff971145c9 libdyld.dylib`start + 1
    frame #9: 0x00007fff971145c9 libdyld.dylib`start + 1

  thread #2: tid = 0x59436, 0x00007fff929973fa libsystem_kernel.dylib`__select + 10
    frame #0: 0x00007fff929973fa libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010bf7e9c1 tester_io`inputWaiting(unsigned int) + 161
    frame #2: 0x000000010bf7b9ae tester_io`AMUDP_SPMDHandleControlTraffic + 78
    frame #3: 0x000000010bf7619d tester_io`AM_Poll + 77
    frame #4: 0x000000010bf07a22 tester_io`gasnetc_AMPoll + 50
    frame #5: 0x000000010b7771d1 tester_io`do_some_polling() [inlined] gasneti_AMPoll + 5 at gasnet_help.h:597
    frame #6: 0x000000010b7771cc tester_io`do_some_polling() [inlined] gasnet_AMPoll at gasnet_help.h:712
    frame #7: 0x000000010b7771cc tester_io`do_some_polling() + 28 at activemsg.cc:2057
    frame #8: 0x000000010b7775e6 tester_io`gasnet_poll_thread_loop(data=0x0000000000000000) + 38 at activemsg.cc:2073
    frame #9: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #10: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #11: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #3: tid = 0x59437, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b775f73 tester_io`IncomingMessageManager::get_messages(this=0x00007fe6f2504cd0, sender=0x000000010cb2aea4, wait=true) + 131 at activemsg.cc:686
    frame #3: 0x000000010b7760f1 tester_io`LegionRuntime::LowLevel::HandlerThread::thread_main(this=0x00007fe6f2505920) + 49 at activemsg.cc:725
    frame #4: 0x000000010b6a98b9 tester_io`LegionRuntime::LowLevel::PreemptableThread::thread_entry(data=0x00007fe6f2505920) + 153 at lowlevel.cc:6392
    frame #5: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #7: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #4: tid = 0x59438, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b761a5e tester_io`LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(this=0x00007fe6f2506070, dma_thread=0x00007fe6f2506130, wait_on_empty=true) + 1902 at channel.h:1078
    frame #3: 0x000000010b7528c9 tester_io`LegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fe6f2506130) + 1129 at channel.cc:1450
    frame #4: 0x000000010b7a8ebd tester_io`LegionRuntime::LowLevel::DMAThread::start(arg=0x00007fe6f2506130) + 29 at channel.h:974
    frame #5: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #7: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #5: tid = 0x59439, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b761a5e tester_io`LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(this=0x00007fe6f2506070, dma_thread=0x00007fe6f2506550, wait_on_empty=true) + 1902 at channel.h:1078
    frame #3: 0x000000010b7528c9 tester_io`LegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fe6f2506550) + 1129 at channel.cc:1450
    frame #4: 0x000000010b7a8ebd tester_io`LegionRuntime::LowLevel::DMAThread::start(arg=0x00007fe6f2506550) + 29 at channel.h:974
    frame #5: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #7: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #6: tid = 0x5943a, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b761a5e tester_io`LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(this=0x00007fe6f2506070, dma_thread=0x00007fe6f2506980, wait_on_empty=true) + 1902 at channel.h:1078
    frame #3: 0x000000010b7528c9 tester_io`LegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fe6f2506980) + 1129 at channel.cc:1450
    frame #4: 0x000000010b7a8ebd tester_io`LegionRuntime::LowLevel::DMAThread::start(arg=0x00007fe6f2506980) + 29 at channel.h:974
    frame #5: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #6: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #7: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #7: tid = 0x5943b, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b6c17ad tester_io`GASNetCondVar::wait(this=0x00007fe6f2505ac0) + 45 at activemsg.h:124
    frame #3: 0x000000010b6aa45b tester_io`LegionRuntime::LowLevel::GreenletProcessor::execute_task(this=0x00007fe6f2505a50) + 411 at lowlevel.cc:6650
    frame #4: 0x000000010b6aa287 tester_io`LegionRuntime::LowLevel::GreenletThread::thread_main(this=0x00007fe6f2507190) + 55 at lowlevel.cc:6467
    frame #5: 0x000000010b6a98b9 tester_io`LegionRuntime::LowLevel::PreemptableThread::thread_entry(data=0x00007fe6f2507190) + 153 at lowlevel.cc:6392
    frame #6: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #7: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #8: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

  thread #8: tid = 0x5943c, 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #0: 0x00007fff92997136 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff938f8e0c libsystem_pthread.dylib`_pthread_cond_wait + 693
    frame #2: 0x000000010b6c17ad tester_io`GASNetCondVar::wait(this=0x00007fe6f2505c20) + 45 at activemsg.h:124
    frame #3: 0x000000010b6aa45b tester_io`LegionRuntime::LowLevel::GreenletProcessor::execute_task(this=0x00007fe6f2505bb0) + 411 at lowlevel.cc:6650
    frame #4: 0x000000010b6aa287 tester_io`LegionRuntime::LowLevel::GreenletThread::thread_main(this=0x00007fe6f25071b0) + 55 at lowlevel.cc:6467
    frame #5: 0x000000010b6a98b9 tester_io`LegionRuntime::LowLevel::PreemptableThread::thread_entry(data=0x00007fe6f25071b0) + 153 at lowlevel.cc:6392
    frame #6: 0x00007fff938f8268 libsystem_pthread.dylib`_pthread_body + 131
    frame #7: 0x00007fff938f81e5 libsystem_pthread.dylib`_pthread_start + 176
    frame #8: 0x00007fff938f641d libsystem_pthread.dylib`thread_start + 13

must epoch error on full ghost

After the versioning branch has been merged into the master branch, I'm occasionally seeing a failed constraint from the must epoch launch in the full ghost example. There has been a known issue with must epoch launches in the versioning branch and I think this is the same issue that still hasn't got fixed. I got this error more frequently when I oversubscribe the physical cores in the system (for example, run ./ghost -b 16 -ll:cpu 16 on only four processors). Here is the stacktrace and the error message:

(lldb) target create "./ghost"
Current executable set to './ghost' (x86_64).
(lldb) settings set -- target.run-args "-b" "16" "-l" "10" "-ll:cpu" "16"
(lldb) run
Process 59746 launched: './ghost' (x86_64)
[0 - 7fff79f4b000] {4}{threads}: reservation ('CPU proc 80000002') cannot be satisfied
Running stencil computation for 1024 elements for 10 steps...
Partitioning data into 16 sub-regions...
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 2 of operation spmd (UID 4) is using uninitialized data for field(s) 2 of logical region (49,1,32)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 4) is using uninitialized data for field(s) 2 of logical region (20,1,3)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 5) is using uninitialized data for field(s) 2 of logical region (22,1,5)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 6) is using uninitialized data for field(s) 2 of logical region (24,1,7)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 7) is using uninitialized data for field(s) 2 of logical region (26,1,9)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 8) is using uninitialized data for field(s) 2 of logical region (28,1,11)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 9) is using uninitialized data for field(s) 2 of logical region (30,1,13)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 10) is using uninitialized data for field(s) 2 of logical region (32,1,15)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 11) is using uninitialized data for field(s) 2 of logical region (34,1,17)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 12) is using uninitialized data for field(s) 2 of logical region (36,1,19)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 13) is using uninitialized data for field(s) 2 of logical region (38,1,21)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 14) is using uninitialized data for field(s) 2 of logical region (40,1,23)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 15) is using uninitialized data for field(s) 2 of logical region (42,1,25)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 16) is using uninitialized data for field(s) 2 of logical region (44,1,27)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 17) is using uninitialized data for field(s) 2 of logical region (46,1,29)
[0 - 700000086000] {4}{runtime}: WARNING: Region requirement 3 of operation spmd (UID 18) is using uninitialized data for field(s) 2 of logical region (48,1,31)
[0 - 700000086000] {5}{runtime}: MUST EPOCH ERROR: failed constraint! Task spmd (ID 5) mapped region 0 to instance e0000000 in memory 60000000 , but task spmd (ID 4) mapped region 3 to instance e0000007 in memory 60000000.
Assertion failed: (false), function trigger_execution, file /Users/wclee/Workspace/stanford/projects/legion//runtime/legion/legion_ops.cc, line 7084.
Process 59746 stopped

  • thread #2: tid = 0x80a101, 0x00007fff8e84f0ae libsystem_kernel.dylib__pthread_kill + 10, stop reason = signal SIGABRT frame #0: 0x00007fff8e84f0ae libsystem_kernel.dylib__pthread_kill + 10
    libsystem_kernel.dylib`__pthread_kill:

-> 0x7fff8e84f0ae <+10>: jae 0x7fff8e84f0b8 ; <+20>

0x7fff8e84f0b0 <+12>: movq   %rax, %rdi

0x7fff8e84f0b3 <+15>: jmp    0x7fff8e84a3ef            ; cerror_nocancel

0x7fff8e84f0b8 <+20>: retq   

(lldb) bt

  • thread #2: tid = 0x80a101, 0x00007fff8e84f0ae libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGABRT
    • frame #0: 0x00007fff8e84f0ae libsystem_kernel.dylib`__pthread_kill + 10

      frame #1: 0x00007fff8e654500 libsystem_pthread.dylib`pthread_kill + 90

      frame #2: 0x00007fff9813937b libsystem_c.dylib`abort + 129

      frame #3: 0x00007fff981009c4 libsystem_c.dylib`__assert_rtn + 321

      frame #4: 0x000000010008a8da ghost`LegionRuntime::HighLevel::MustEpochOp::trigger_execution(this=0x00000001018233d0) + 7930 at legion_ops.cc:7084

      frame #5: 0x00000001005f18b3 ghost`LegionRuntime::HighLevel::Runtime::high_level_runtime_task(args=0x0000000100f342c0, arglen=20, p=(id = 2147483648)) + 1123 at runtime.cc:15895

      frame #6: 0x00000001008247e9 ghost`Realm::Task::execute_on_processor(this=0x0000000100f31bf0, p=(id = 2147483648)) + 297 at tasks.cc:80

      frame #7: 0x0000000100829089 ghost`Realm::UserThreadTaskScheduler::execute_task(this=0x0000000101801610, task=0x0000000100f31bf0) + 41 at tasks.cc:884

      frame #8: 0x00000001008267a4 ghost`Realm::ThreadedTaskScheduler::scheduler_loop(this=0x0000000101801610) + 1668 at tasks.cc:448

      frame #9: 0x0000000100846641 ghost`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &(Realm::ThreadedTaskScheduler::scheduler_loop())>(obj=0x0000000101801610) + 97 at threads.inl:127

      frame #10: 0x00000001007feb16 ghost`Realm::UserThread::uthread_entry() + 358 at threads.cc:740

      frame #11: 0x00007fff8929c33b libsystem_platform.dylib`_ctx_start + 11

assertion failure in check_for_catchup

In latest dma branch I see the periodic failure:

Assertion failed: (owner != gasnet_mynode()), function check_for_catchup, file /Users/nwatkins/src/legion/runtime/realm/event_impl.cc, line 515.

Running

Running `GASNET_BACKTRACE=1  GASNET_MASTERIP='127.0.0.1' GASNET_SPAWN=-L SSH_SERVERS="localhost localhost" amudprun -np 2 ./tester_io

in test/hdf_attach_subregion_parallel

Stack trace is so far evading capture. Will update when possible.

Thrust v1.8.0 interop

Compile error on Ubuntu 14.04 with NVCC release 7.0, V7.0.17 for thrust_interop example starting with Thrust v1.8.0 and above.

Runtime functions referenced:
cudaEventDestroy
cudaEventCreateWithFlags
cudaEventRecord
cudaDeviceGetAttribute
cudaStreamWaitEvent

Top-level files referenced:

thrust/system/cuda/detail/bulk/detail/cuda_launcher/runtime_introspection.inl (lines 43-47)
thrust/system/cuda/detail/bulk/detail/async.inl (lines 39 & 70)

Realm Registered Memory Failure at Start-up Without GASNet

Asking Realm for registered memory without GASNet results in a start-up assertion. Here is the relevant part of backtrace. The problem is that 'prealloc_base' is NULL, but that is only allowed if the '_register' parameter is false, which it is not in this case.
#7 0x00007f9db18bcb22 in __GI___assert_fail (assertion=0x7f9db1475d64 "!_registered", file=0x7f9db1475b00 "/home/mebauer/extern_legion/legion/runtime//realm/mem_impl.cc", line=467,

function=0x7f9db1476c20 <Realm::LocalCPUMemory::LocalCPUMemory(Realm::Memory, unsigned long, void*, bool)::__PRETTY_FUNCTION__> "Realm::LocalCPUMemory::LocalCPUMemory(Realm::Memory, size_t, void*, bool)")
at assert.c:101

#8 0x00007f9db1189810 in Realm::LocalCPUMemory::LocalCPUMemory (this=0x681fa50, _me=..., _size=8589934592, prealloc_base=0x0, _registered=true)

at /home/mebauer/extern_legion/legion/runtime//realm/mem_impl.cc:467

#9 0x00007f9db11a2fec in Realm::RuntimeImpl::init (this=0x5d55df0, argc=0x7fff7e561b2c, argv=0x7fff7e561b20) at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:636
#10 0x00007f9db11a134b in Realm::Runtime::init (this=0x7fff7e561b40, argc=0x7fff7e561b2c, argv=0x7fff7e561b20) at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:108

Assertion failure in legion_analysis.cc

I encountered the following assertion failure when I ran a testcase (test/attach-file) under attach-file branch. The testcase is almost identical to examples/08_multiple_partitions except it checkpoints stencil logical region into persistent file.

multiple_partitions: ../../runtime//legion/legion_analysis.cc:2392: void LegionRuntime::HighLevel::CurrentState::reset(): Assertion `!has_persistent' failed.

[0] Thread 3 (Thread 0x7f2edea09700 (LWP 10259)):
[0] #0 0x00007f2edff8a619 in **libc_waitpid (pid=10261, stat_loc=stat_loc@entry=0x7f2ea25f8c10, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0] #1 0x00007f2edff0f1d2 in do_system (line=) at ../sysdeps/posix/system.c:148
[0] #2 0x0000000000da2978 in gasneti_bt_gdb ()
[0] #3 0x0000000000da57db in gasneti_print_backtrace ()
[0] #4 0x0000000000dfe311 in gasneti_defaultSignalHandler ()
[0] #5
[0] #6 0x00007f2edfeffbb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
[0] #7 0x00007f2edff02fc8 in __GI_abort () at abort.c:89
[0] #8 0x00007f2edfef8a76 in __assert_fail_base (fmt=0x7f2ee004a370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xe2e047 "!has_persistent", file=file@entry=0xe2da78 "../../runtime//legion/legion_analysis.cc", line=line@entry=2392, function=function@entry=0xe315e0 LegionRuntime::HighLevel::CurrentState::reset()::__PRETTY_FUNCTION** "void LegionRuntime::HighLevel::CurrentState::reset()") at assert.c:92
[0] #9 0x00007f2edfef8b22 in **GI___assert_fail (assertion=0xe2e047 "!has_persistent", file=0xe2da78 "../../runtime//legion/legion_analysis.cc", line=2392, function=0xe315e0 LegionRuntime::HighLevel::CurrentState::reset()::__PRETTY_FUNCTION** "void LegionRuntime::HighLevel::CurrentState::reset()") at assert.c:101
[0] #10 0x0000000000b1232e in LegionRuntime::HighLevel::CurrentState::reset (this=0x7f2ede955320) at ../../runtime//legion/legion_analysis.cc:2392
[0] #11 0x0000000000b8597b in LegionRuntime::HighLevel::RegionTreeNode::invalidate_current_state (this=0x7f2ede9071a0, ctx=0, logical_users_only=false) at ../../runtime//legion/region_tree.cc:13609
[0] #12 0x0000000000b0e3ec in LegionRuntime::HighLevel::CurrentInvalidator::visit_region (this=0x7f2ea25fb6f0, node=0x7f2ede9071a0) at ../../runtime//legion/legion_analysis.cc:1461
[0] #13 0x0000000000b87131 in LegionRuntime::HighLevel::RegionNode::visit_node (this=0x7f2ede9071a0, traverser=0x7f2ea25fb6f0) at ../../runtime//legion/region_tree.cc:14337
[0] #14 0x0000000000b57432 in LegionRuntime::HighLevel::RegionTreeForest::invalidate_current_context (this=0x1cf5400, ctx=..., handle=..., logical_users_only=false) at ../../runtime//legion/region_tree.cc:2129
[0] #15 0x0000000000a59735 in LegionRuntime::HighLevel::RemoteTask::deactivate (this=0x7f2ede9033c0) at ../../runtime//legion/legion_tasks.cc:7935
[0] #16 0x0000000000a53977 in LegionRuntime::HighLevel::IndividualTask::deactivate (this=0x7f2ede9010c0) at ../../runtime//legion/legion_tasks.cc:6086
[0] #17 0x0000000000a5684d in LegionRuntime::HighLevel::IndividualTask::trigger_task_commit (this=0x7f2ede9010c0) at ../../runtime//legion/legion_tasks.cc:6889
[0] #18 0x0000000000a43895 in LegionRuntime::HighLevel::TaskOp::trigger_children_committed (this=0x7f2ede9010c0) at ../../runtime//legion/legion_tasks.cc:1918
[0] #19 0x0000000000a46d28 in LegionRuntime::HighLevel::SingleTask::register_child_commit (this=0x7f2ede9010c0, op=0x7f2ede971978) at ../../runtime//legion/legion_tasks.cc:2736
[0] #20 0x00000000009f7e31 in LegionRuntime::HighLevel::Operation::commit_operation (this=0x7f2ede971978) at ../../runtime//legion/legion_ops.cc:518
[0] #21 0x0000000000a56836 in LegionRuntime::HighLevel::IndividualTask::trigger_task_commit (this=0x7f2ede9717e0) at ../../runtime//legion/legion_tasks.cc:6887
[0] #22 0x0000000000a3db1f in LegionRuntime::HighLevel::TaskOp::trigger_commit (this=0x7f2ede9717e0) at ../../runtime//legion/legion_tasks.cc:502
[0] #23 0x00000000009f7dc4 in LegionRuntime::HighLevel::Operation::complete_operation (this=0x7f2ede971978) at ../../runtime//legion/legion_ops.cc:504
[0] #24 0x0000000000a566fb in LegionRuntime::HighLevel::IndividualTask::trigger_task_complete (this=0x7f2ede9717e0) at ../../runtime//legion/legion_tasks.cc:6866
[0] #25 0x0000000000a3da45 in LegionRuntime::HighLevel::TaskOp::trigger_complete (this=0x7f2ede9717e0) at ../../runtime//legion/legion_tasks.cc:482
[0] #26 0x00000000009f797c in LegionRuntime::HighLevel::Operation::complete_execution (this=0x7f2ede971978, wait_on=...) at ../../runtime//legion/legion_ops.cc:425
[0] #27 0x0000000000c0ace1 in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (args=0x7f2ea1a04120, arglen=12, p=...) at ../../runtime//legion/runtime.cc:15959
[0] #28 0x0000000000d0415c in Realm::Task::execute_on_processor (this=0x7f2ea1a0d8c0, p=...) at ../../runtime//realm/tasks.cc:80
[0] #29 0x0000000000d063de in Realm::UserThreadTaskScheduler::execute_task (this=0x1cec920, task=0x7f2ea1a0d8c0) at ../../runtime//realm/tasks.cc:887
[0] #30 0x0000000000d04e9f in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x1cec920) at ../../runtime//realm/tasks.cc:448
[0] #31 0x0000000000d09bd6 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x1cec920) at ../../runtime//realm/threads.inl:127
[0] #32 0x0000000000cf1c57 in Realm::UserThread::uthread_entry () at ../../runtime//realm/threads.cc:740
[0] #33 0x00007f2edff127a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[0] #34 0x0000000000000000 in ?? ()

Compilation issue with -DLEGION_IDS_ARE_64BIT

I see some compilation issues with the general low-level runtime due to active message arguments being larger than 64 bytes when compiling with the -DLEGION_IDS_ARE_64BIT. It should happen when trying to compile any program with GASNet enabled. The active message arguments need 18 4-byte words, but a maximum of 16 are supported by GASNet. Here is an example error message:

In file included from /home/mebauer/extern_legion/legion/runtime/activemsg.h:22:0,
from /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.h:24,
from /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:18:
/home/mebauer/extern_legion/legion/runtime/activemsg.h: In instantiation of โ€˜static int ActiveMessageMediumNoReply<MSGID, MSGTYPE, FNPTR>::add_handler_entries(gasnet_handlerentry_t_, const char_) [with int MSGID = 201; MSGTYPE = LegionRuntime::LowLevel::RemoteFillArgs; void (* FNPTR)(MSGTYPE, const void_, size_t) = LegionRuntime::LowLevel::handle_remote_fill; gasnet_handlerentry_t = gasneti_handlerentry_s]โ€™:
/home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:421:36: required from here
/home/mebauer/extern_legion/legion/runtime/activemsg.h:615:18: error: invalid application of โ€˜sizeofโ€™ to incomplete type โ€˜ActiveMessageMediumNoReply<201, LegionRuntime::LowLevel::RemoteFillArgs, LegionRuntime::LowLevel::handle_remote_fill>::MessageRawArgsType {aka MessageRawArgs<LegionRuntime::LowLevel::RemoteFillArgs, 201, dummy_short_handlerLegionRuntime::LowLevel::RemoteFillArgs, LegionRuntime::LowLevel::handle_remote_fill, 18>}โ€™
assert(sizeof(MessageRawArgsType) <= 64); // max of 16 4-byte args
^
In file included from /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.h:24:0,
from /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:18:
/home/mebauer/extern_legion/legion/runtime/activemsg.h:617:22: error: incomplete type โ€˜ActiveMessageMediumNoReply<201, LegionRuntime::LowLevel::RemoteFillArgs, LegionRuntime::LowLevel::handle_remote_fill>::MessageRawArgsType {aka MessageRawArgs<LegionRuntime::LowLevel::RemoteFillArgs, 201, dummy_short_handlerLegionRuntime::LowLevel::RemoteFillArgs, LegionRuntime::LowLevel::handle_remote_fill, 18>}โ€™ used in nested name specifier
entries[0].fnptr = (void (_)()) (MessageRawArgsType::handler_medium);

Invalid remove version state bug

Running the regent miniaero with a release binary on two nodes raises the following assertion failure:

terra: /home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc:665: bool LegionRuntime::HighLevel::DistributedCollectable::update_state(bool&, bool&, bool&, bool&, bool&): Assertion `false' failed.

However, this bug is not reproducible with a debug binary, the binding library needs to be manually compiled with options "CC_FLAGS='-g' DEBUG=0" passed.

Here is the stacktrace:
#0 0x00007fb063f6c9bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb063f6c854 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x00007fb063aa2ffa in Realm::realm_freeze (signal=6) at /home/wclee/Workspace/legion/runtime/realm/runtime_impl.cc:75
#3
#4 0x00007fb063ee1bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007fb063ee4fc8 in __GI_abort () at abort.c:89
#6 0x00007fb063edaa76 in __assert_fail_base (fmt=0x7fb06402c370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fb063bd71c9 "false", file=file@entry=0x7fb063bc9bd8 "/home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc", line=line@entry=665,

function=function@entry=0x7fb063bc9c40 <LegionRuntime::HighLevel::DistributedCollectable::update_state(bool&, bool&, bool&, bool&, bool&)::__PRETTY_FUNCTION__> "bool LegionRuntime::HighLevel::DistributedCollectable::update_state(bool&, bool&, bool&, bool&, bool&)") at assert.c:92

#7 0x00007fb063edab22 in __GI___assert_fail (assertion=0x7fb063bd71c9 "false", file=0x7fb063bc9bd8 "/home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc", line=665,

function=0x7fb063bc9c40 <LegionRuntime::HighLevel::DistributedCollectable::update_state(bool&, bool&, bool&, bool&, bool&)::__PRETTY_FUNCTION__> "bool LegionRuntime::HighLevel::DistributedCollectable::update_state(bool&, bool&, bool&, bool&, bool&)") at assert.c:101

#8 0x00007fb063a9378f in LegionRuntime::HighLevel::DistributedCollectable::update_state (this=this@entry=0x7fb010adc8a0, need_activate=@0x7fb0101fff8b: false, need_validate=@0x7fb0101fff8c: false, need_invalidate=@0x7fb0101fff8d: false, need_deactivate=@0x7fb0101fff8e: false, do_deletion=@0x7fb0101fff8f: false)

at /home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc:665

#9 0x00007fb063a938ea in LegionRuntime::HighLevel::DistributedCollectable::add_gc_reference (this=0x7fb010adc8a0, cnt=cnt@entry=1) at /home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc:115
#10 0x00007fb0639cc7fe in add_nested_gc_ref (source=, cnt=1, this=) at /home/wclee/Workspace/legion/runtime/legion/garbage_collection.h:343
#11 LegionRuntime::HighLevel::VersionState::notify_active (this=0x7fb010b0ec60) at /home/wclee/Workspace/legion/runtime/legion/legion_analysis.cc:5205
#12 0x00007fb063a9394b in LegionRuntime::HighLevel::DistributedCollectable::add_gc_reference (this=0x7fb010b0ec60, cnt=cnt@entry=1) at /home/wclee/Workspace/legion/runtime/legion/garbage_collection.cc:101
#13 0x00007fb0639d86db in add_base_gc_ref (source=LegionRuntime::HighLevel::PHYSICAL_STATE_REF, cnt=1, this=) at /home/wclee/Workspace/legion/runtime/legion/garbage_collection.h:332
#14 LegionRuntime::HighLevel::PhysicalState::add_version_state (this=, state=0x7fb010b0ec60, state_mask=...) at /home/wclee/Workspace/legion/runtime/legion/legion_analysis.cc:4008
#15 0x00007fb0639d9d72 in LegionRuntime::HighLevel::VersionInfo::unpack_node_info (this=this@entry=0x7fb010462480, node=0x7fb010a99ea0, ctx=ctx@entry=0, derez=..., source=source@entry=0) at /home/wclee/Workspace/legion/runtime/legion/legion_analysis.cc:1061
#16 0x00007fb0639d9f88 in LegionRuntime::HighLevel::VersionInfo::unpack_buffer (this=0x7fb010462480, forest=0x7fb010204a20, ctx=0) at /home/wclee/Workspace/legion/runtime/legion/legion_analysis.cc:918
#17 0x00007fb0639ebcca in LegionRuntime::HighLevel::VersionInfo::make_local (this=0x7fb010462480, preconditions=std::set with 0 elements, forest=, ctx=) at /home/wclee/Workspace/legion/runtime/legion/legion_analysis.cc:672
#18 0x00007fb063966847 in LegionRuntime::HighLevel::SliceTask::trigger_remote_state_analysis (this=0x7fb0104615c0, ready_event=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9896
#19 0x00007fb063a7e0fe in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (args=0x7fb010aa04a0, arglen=, p=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:16019
#20 0x00007fb063ae54a6 in Realm::Task::execute_on_processor (this=0x7fb010b5e420, p=...) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:80
#21 0x00007fb063ae55a5 in Realm::UserThreadTaskScheduler::execute_task (this=, task=) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:887
#22 0x00007fb063ae6538 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x3dd9000) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:448
#23 0x00007fb063ad6501 in Realm::UserThread::uthread_entry () at /home/wclee/Workspace/legion/runtime/realm/threads.cc:740
#24 0x00007fb063ef47a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#25 0x0000000000000000 in ?? ()

assertion failures in DynamicTable

@lightsighter I'm seeing the following assertion failures. it is somewhat inconsistent in that the same invocation may work with 1 node but not with 2. The only notable thing is that in these runs there is a single logical region that is large (e.g. > 4B elements, but square, so say > 2^16 x 2^16) partitioned into much smaller pieces.

tester_io: /home/nwatkins/src/legion/runtime//legion_utilities.h:8391: bool LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::has_entry(LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::IT) const [with ALLOCATOR = LegionRuntime::HighLevel::DynamicTableAllocator<LegionRuntime::HighLevel::PhysicalState, 10ul, 8ul>; LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::IT = int]: Assertion `(child != 0) && (child->level == (n->level -1)) && (index >= child->first_index) && (index <= child->last_index)' failed.
tester_io: /home/nwatkins/src/legion/runtime//legion_utilities.h:8559: LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::NodeBase* LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::lookup_leaf(LegionRuntime::HighLevel::Dynami
cTable<ALLOCATOR>::IT) [with ALLOCATOR = LegionRuntime::HighLevel::DynamicTableAllocator<LegionRuntime::HighLevel::PhysicalState, 10ul, 8ul>; LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::NodeBase = LegionRuntime::HighLeve
l::DynamicTableNodeBase<int>; typename ALLOCATOR::IT = int; LegionRuntime::HighLevel::DynamicTable<ALLOCATOR>::IT = int]: Assertion `(child != 0) && (child->level == (n->level - 1)) && (index >= child->first_index) && (index <=
 child->last_index)' failed.

raw_dense_ptr error when using shared_lowlevel

linking error using raw_dense_ptr and shared_lowlevel

g++ -o raw_dense_ptr.o -c raw_dense_ptr.cc  -I/home/steve/legion/runtime -I/home/steve/legion/runtime/realm -I/home/steve/legion/runtime/legion -I/home/steve/legion/runtime/mappers  -march=native -DDEBUG_LOW_LEVEL -DDEBUG_HIGH_LEVEL -ggdb  -DCOMPILE_TIME_MIN_LEVEL=LEVEL_DEBUG   -Wall -Wno-strict-overflow -Werror -DSHARED_LOWLEVEL
---> Linking objects into one binary: raw_dense_ptr
g++ -o raw_dense_ptr raw_dense_ptr.o  -L. -llegion -lsharedllr  -lrt -lpthread -ldl -rdynamic 
raw_dense_ptr.o: In function `double* LegionRuntime::Accessor::AccessorType::Generic::Typed<double, double>::raw_dense_ptr<1>(LegionRuntime::Arrays::Rect<1> const&, LegionRuntime::Arrays::Rect<1>&, LegionRuntime::Accessor::ByteOffset&)':
/home/steve/legion/runtime/accessor.h:315: undefined reference to `void* LegionRuntime::Accessor::AccessorType::Generic::Untyped::raw_dense_ptr<1>(LegionRuntime::Arrays::Rect<1> const&, LegionRuntime::Arrays::Rect<1>&, LegionRuntime::Accessor::ByteOffset&)'
collect2: error: ld returned 1 exit status
make: *** [raw_dense_ptr] Error 1
#include <cstdio>
#include <cassert>
#include <cstdlib>
#include "legion.h"
using namespace LegionRuntime::HighLevel;
using namespace LegionRuntime::Accessor;

enum TaskIDs {
    TOP_LEVEL_TASK_ID,
};

enum FieldIDs {
    FID_X,
};

void top_level_task(const Task *task,
                    const std::vector<PhysicalRegion> &regions,
                    Context ctx, HighLevelRuntime *runtime)
{
    Rect<1> elem_rect(Point<1>(0),Point<1>(10));
    IndexSpace is = runtime->create_index_space(ctx,
                    Domain::from_rect<1>(elem_rect));
    FieldSpace input_fs = runtime->create_field_space(ctx);
    {
        FieldAllocator allocator =
            runtime->create_field_allocator(ctx, input_fs);
        allocator.allocate_field(sizeof(double),FID_X);
    }
    LogicalRegion input_lr = runtime->create_logical_region(ctx, is, input_fs);

    RegionRequirement req(input_lr, READ_WRITE, EXCLUSIVE, input_lr);
    req.add_field(FID_X);
    InlineLauncher input_launcher(req);
    PhysicalRegion input_region = runtime->map_region(ctx, input_launcher);
    input_region.wait_until_valid();

    RegionAccessor<AccessorType::Generic, double> acc_x =
        input_region.get_field_accessor(FID_X).typeify<double>();

    Rect<1> subrect;
    ByteOffset stride;

    double *ptr = acc_x.template raw_dense_ptr<1>(elem_rect, subrect, stride);
    assert((ptr != NULL) && (elem_rect == subrect) && (stride.offset == sizeof(double)));

    // Clean up all our data structures.
    runtime->destroy_logical_region(ctx, input_lr);
    runtime->destroy_field_space(ctx, input_fs);
    runtime->destroy_index_space(ctx, is);
}

int main(int argc, char **argv)
{
    HighLevelRuntime::set_top_level_task_id(TOP_LEVEL_TASK_ID);
    HighLevelRuntime::register_legion_task<top_level_task>(TOP_LEVEL_TASK_ID,
            Processor::LOC_PROC, true, false);

    return HighLevelRuntime::start(argc, argv);
}

dma: infinite loop scaling df_attach_subregion_parallel (with partial fix)

I'm working on scaling the hdf_attach_subregion_parallel example. It works fine up until 2^19 elements. When moving to 2^20 elements the dma_thread sits in a CPU bound loop. The following patch fixes the problem, but it seems this solution is only a band aid for scaling. Perhaps the completed requests aren't being reclaimed?

diff --git a/runtime/lowlevel_dma.cc b/runtime/lowlevel_dma.cc
index 4645c7e..4888ec4 100644
--- a/runtime/lowlevel_dma.cc
+++ b/runtime/lowlevel_dma.cc
@@ -2759,7 +2759,7 @@ namespace LegionRuntime {
             log_dma.info("create mem->hdf xferdes\n");
             XferDes* xd = new HDFXferDes<DIM>(channel_manager->get_hdf_write_channel(), false,
                                               src_buf, dst_buf, src_mem_base, hdf_metadata,
-                                              domain, oasvec, 100/*max_nr*/,
+                                              domain, oasvec, 500/*max_nr*/,
                                               XferOrder::DST_FIFO, XferDes::XFER_HDF_WRITE);
             path.push_back(xd);
             break;
@@ -2970,7 +2970,7 @@ namespace LegionRuntime {
             XferDes* xd = new HDFXferDes<DIM>(channel_manager->get_hdf_read_channel(), false,
                                               src_buf, dst_buf, dst_mem_base, src_hdf_metadata,
                                               domain, oasvec,
-                                              100/*max_nr*/, Layouts::XferOrder::SRC_FIFO, XferDes::XFER_HDF_READ);
+                                              500/*max_nr*/, Layouts::XferOrder::SRC_FIFO, XferDes::XFER_HDF_READ);
             path.push_back(xd);
             break;
           }
diff --git a/runtime/realm/runtime_impl.cc b/runtime/realm/runtime_impl.cc
index ffc4fe0..68eff83 100644
--- a/runtime/realm/runtime_impl.cc
+++ b/runtime/realm/runtime_impl.cc
@@ -989,7 +989,7 @@ namespace Realm {

       // start dma system at the very ending of initialization
       // since we need list of local gpus to create channels
-      LegionRuntime::LowLevel::start_dma_system(dma_worker_threads, 100
+      LegionRuntime::LowLevel::start_dma_system(dma_worker_threads, 500
 #ifdef USE_CUDA
                        ,local_gpus
 #endif

Better Legion error message when no processors of a given type exist

When you run the thrust_interop example without -ll:gpu 1, you get:

thrust_interop: ../../runtime/legion/runtime.cc:10522: LegionRuntime::HighLevel::AddressSpaceID 
LegionRuntime::HighLevel::Runtime::find_address_space(LegionRuntime::HighLevel::Processor) 
const: Assertion `finder != proc_spaces.end()' failed.

I believe this occurs because a task is being launched that only has a GPU implementation, and either
the default mapper or the runtime is not noticing when none of those exist. This is a fairly common mistake for a user to make when running a Legion application, so a better error message would be nice to have.

all classes/structs used by external Legion and Realm APIs should be ostream-able

This is hugely useful for debugging, especially when you're trying to help somebody in an environment where interactive debugging is problematic. It's also a bunch of work, so this is will require incremental work over time to achieve.

(Looks like I can't assign this to multiple people, so I'll keep ownership of this for now.)

Segfault when serializing and deserializing a SSETLBitMask<T>

I found an issue on a machine which supports SSE2 instructions but not AVX and some issue with the serializer. Both template inline void Deserializer::deserialize(T &element) and template inline void Serializer::serialize(const T &element) segfault when trying to call SSETLBitMask<1024u>::operator=(). I'm assuming that an alignment issue is occuring when an array of char* is cast as an SSETLBitMask<1024u>*, which is causing the operator=() to fail.

I have performed a hack-ish fix by changing ((T)(buffer+index)) = element; to memcpy(buffer+index,&element,sizeof(T)); and vice-versa for the deserialize.

use of uninitialized incoming message manager

I'm seeing the following assertion failure in dma branch.

Pinned Memory Usage: GASNET=256, RMEM=0, LMB=4, SDP=64, total=324
Assertion failed: (incoming_message_manager != 0), function enqueue_incoming, file /Users/nwatkins/src/legion/runtime/activemsg.cc, line 717.
*** Caught a fatal signal: SIGABRT(6) on node 0/2
[0] 0: 0   tester_io                           0x000000010fdef265 gasneti_bt_execinfo + 37 
[0] 1: 1   tester_io                           0x000000010fdec375 gasneti_print_backtrace + 581 
[0] 2: 2   tester_io                           0x000000010fe47f7c gasneti_defaultSignalHandler + 268 
[0] 3: 3   libsystem_platform.dylib            0x00007fff8e6b5f1a _sigtramp + 26 
[0] 4: 4   ???                                 0x0000000000000000 0x0 + 0 
[0] 5: 5   libsystem_c.dylib                   0x00007fff9286cb53 abort + 129 
[0] 6: 6   libsystem_c.dylib                   0x00007fff92834c39 basename + 0 
[0] 7: 7   tester_io                           0x000000010f595a98 _Z16enqueue_incomingtP15IncomingMessage + 88 
[0] 8: 8   tester_io                           0x000000010f6b087a _ZN14MessageRawArgsIN5Realm19NodeAnnounceMessage11RequestArgsELi141EXadL_Z19dummy_short_handlerIS2_EvT_EEXadL_ZNS1_14handle_requestES2_PKvmEELi8EE14handler_mediumEPvS8_miiiiiiii + 442 
[0] 9: 9   tester_io                           0x000000010fe524b4 _Z19AMUDP_processPacketP9amudp_bufi + 4884 
[0] 10: 10  tester_io                           0x000000010fe53c15 AM_Poll + 341 
[0] 11: 11  tester_io                           0x000000010fde5392 gasnetc_AMPoll + 50 
[0] 12: 12  tester_io                           0x000000010f596c31 _Z15do_some_pollingv + 33 
[0] 13: 13  tester_io                           0x000000010f597046 _ZL23gasnet_poll_thread_loopPv + 38 
[0] 14: 14  libsystem_pthread.dylib             0x00007fff90409268 _pthread_body + 131 
[0] 15: 15  libsystem_pthread.dylib             0x00007fff904091e5 _pthread_body + 0 
[0] 16: 16  libsystem_pthread.dylib             0x00007fff9040741d thread_start + 13 

Realm Multi-Node Start-up Crash

Caught this crash in start-up in Realm. I've only seen it one time. It looks like someone resized the vector while the loop on machine_impl.cc:247 was iterating over it. However, I couldn't tell which thread modified it because it was long gone by the time the signal for the segfault was raised. Since this happened in start-up, it should be a pretty small range of suspects. No application code had started yet.
#0 0x00007f650b98e9bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f650b98e854 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x0000000000cd6e2e in Realm::realm_freeze (signal=11)

at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:75

#3
#4 0x0000000000aa618a in Realm::Processor::operator!= (this=0x5f70000, rhs=...)

at /home/mebauer/extern_legion/legion/runtime/realm/processor.h:41

#5 0x0000000000d6f081 in Realm::MachineImpl::get_proc_mem_affinity (this=0x31831e0, result=...,

restrict_proc=..., restrict_memory=...)
at /home/mebauer/extern_legion/legion/runtime//realm/machine_impl.cc:247

#6 0x0000000000cda5dc in Realm::RuntimeImpl::init (this=0x3172d60, argc=0x7fff1991341c,

argv=0x7fff19913410) at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:889

#7 0x0000000000cd6efe in Realm::Runtime::init (this=0x7fff19913430, argc=0x7fff1991341c,

argv=0x7fff19913410) at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:106

#8 0x0000000000c44bea in LegionRuntime::HighLevel::Runtime::start (argc=28, argv=0x7fff19913618,

background=false) at /home/mebauer/extern_legion/legion/runtime//legion/runtime.cc:14737

#9 0x0000000000a35b69 in LegionRuntime::HighLevel::HighLevelRuntime::start (argc=28,

argv=0x7fff19913618, background=false)
at /home/mebauer/extern_legion/legion/runtime//legion/legion.cc:3658

#10 0x00000000009f0df5 in main (argc=28, argv=0x7fff19913618) at main.cc:96

Assertion in reduce_index_spaces when computing index space intersections over a pending partition

I'm getting an assertion in reduce_index_spaces when using create_index_space_intersection on a pending partition.

Here is the backtrace:
(lldb) bt

  • thread #5: tid = 0x68f5a, 0x00007fff9acab286 libsystem_kernel.dylib__pthread_kill + 10, stop reason = signal SIGABRT frame #0: 0x00007fff9acab286 libsystem_kernel.dylib__pthread_kill + 10
    frame #1: 0x00007fff98ff89f9 libsystem_pthread.dylibpthread_kill + 90 frame #2: 0x00007fff9909e9b3 libsystem_c.dylibabort + 129
    frame #3: 0x00007fff99066a99 libsystem_c.dylib`__assert_rtn + 321
    • frame #4: 0x000000010b3ddf9d pennantRealm::IndexSpace::reduce_index_spaces(op=ISO_INTERSECT, spaces=0x000000012c4f4f30, result=0x000000012c4f4ea0, mutable_results=false, parent=(id = 2684354561), wait_on=(id = 0, gen = 0)) + 77 at idx_impl.cc:208 frame #5: 0x000000010afba361 pennantLegionRuntime::HighLevel::RegionTreeForest::compute_pending_space(this=0x00007f9993406d60, target=, handles=0x00007f999379f890, is_union=false) + 2001 at region_tree.cc:824
      frame #6: 0x000000010ad1696f pennantLegionRuntime::HighLevel::PendingPartitionOp::ComputePendingSpace::perform(this=0x00007f999379f870, forest=0x00007f9993406d60) + 207 at legion_ops.h:1552 frame #7: 0x000000010ace094a pennantLegionRuntime::HighLevel::PendingPartitionOp::trigger_execution(this=0x00007f999379f760) + 58 at legion_ops.cc:7995
      frame #8: 0x000000010b185ab4 pennantLegionRuntime::HighLevel::Runtime::high_level_runtime_task(args=0x00007f999350d1d0, arglen=20, p=(id = 2147483648)) + 1060 at runtime.cc:15895 frame #9: 0x000000010b3798f0 pennantRealm::Task::execute_on_processor(this=0x00007f999350c240, p=(id = 2147483648)) + 336 at tasks.cc:80
      frame #10: 0x000000010b37dd09 pennantRealm::UserThreadTaskScheduler::execute_task(this=0x00007f9993600cb0, task=0x00007f999350c240) + 41 at tasks.cc:884 frame #11: 0x000000010b37b580 pennantRealm::ThreadedTaskScheduler::scheduler_loop(this=0x00007f9993600cb0) + 1840 at tasks.cc:448
      frame #12: 0x000000010b380c8e pennant`void Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &(obj=0x00007f9993600cb0))>(void*) + 94 at threads.inl:127

Stale Index-Space Meta-Data Results in Remote Instance Overallocation

The low-level runtime is using stale meta-data when doing remote instance allocations which is resulting in very large over-allocations for physical regions using unstructured index spaces. Example from mini-aero: on node 0, when looking at the element mask for an index sub-space, I see the following:

(gdb) p first_enabled_elmt
$64 = 7400448
(gdb) p last_enabled_elmt
$65 = 8222719
(gdb) p $65-$64
$66 = 822271

However, when I ask for a remote physical instance on node 1, I see the following being returned from the StaticAccess data controlled by the reservation associated with the index sub-space

(gdb) p first_elmt
$1 = 0
(gdb) p last_elmt
$2 = 13156351
(gdb) p num_elements
$3 = 13156352

I'm pretty sure the problem is that the meta-data used by the StaticAccess is not being properly updated by the IndexSpaceAllocators when allocations are being performed and so it stays fixed at the absolute maximum boundaries of the index space even when in practice they are much smaller.

S3D performance problem on Titan

Recent versions of the runtime (after 7b73e22)
show poor, and erratic, performance on S3D on Titan with stencils disabled. Instead of
timesteps on the order of 2.1s, they vary wildly between 3s and 11s. The Legion profiler
shows that leaf tasks appear to be correct, but large gaps exist between them.

This issue is known to occur with stencils disabled, so is unlikely to be related to issue #17.
However, that issue prevents measuring performance with stencils enabled (i.e. this bug may
have nothing to do with stencils at all).

Realm Check for Catchup Assertion

Saw this assertion tonight. In the process of handling a remote trigger active message, Realm first has to get the event implementation. When it does this, it checks for catch-up, but inside the catch-up check there is an assertion that it is only happening for a remote event. However, in this case, the event was remote, but it's not anymore. I'm pretty sure it is safe to remove the assertion, but you should check.
#6 0x00007f2d02659a76 in __assert_fail_base (

fmt=0x7f2d027ab370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0xe4f050 "owner != (((void)0), (gasnet_node_t)gasneti_mynode)", 
file=file@entry=0xe4eec0 "/home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc", 
line=line@entry=531, 
function=function@entry=0xe501e0 <Realm::GenEventImpl::check_for_catchup(unsigned int)::__PRETTY_FUNCTION__> "void Realm::GenEventImpl::check_for_catchup(Realm::Event::gen_t)") at assert.c:92

#7 0x00007f2d02659b22 in __GI___assert_fail (

assertion=0xe4f050 "owner != (((void)0), (gasnet_node_t)gasneti_mynode)", 
file=0xe4eec0 "/home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc", line=531, 
function=0xe501e0 <Realm::GenEventImpl::check_for_catchup(unsigned int)::__PRETTY_FUNCTION__> "void Realm::GenEventImpl::check_for_catchup(Realm::Event::gen_t)") at assert.c:101

#8 0x0000000000d140c0 in Realm::GenEventImpl::check_for_catchup (this=0x7f2c8407bdb0,

implied_trigger_gen=8) at /home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc:531

#9 0x0000000000ca3d9c in Realm::RuntimeImpl::get_genevent_impl (this=0x28049c0, e=...)

at /home/mebauer/extern_legion/legion/runtime//realm/runtime_impl.cc:1262

#10 0x0000000000d14774 in Realm::EventTriggerMessage::handle_request (args=...)

at /home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc:700

#11 0x0000000000cba94d in IncomingShortMessage<Realm::EventTriggerMessage::RequestArgs, 147, &Realm::EventTriggerMessage::handle_request, 6>::run_handler (this=0x7f2c94007580)

at /home/mebauer/extern_legion/legion/runtime/activemsg.h:336

#12 0x0000000000d40c9d in IncomingMessageManager::handler_thread_loop (this=0x55e4860)

at /home/mebauer/extern_legion/legion/runtime//activemsg.cc:770

#13 0x0000000000d48e20 in Realm::Thread::thread_entry_wrapper<IncomingMessageManager, &IncomingMessageManager::handler_thread_loop> (obj=0x55e4860)

at /home/mebauer/extern_legion/legion/runtime//realm/threads.inl:127

#14 0x0000000000cee0b9 in Realm::KernelThread::pthread_entry (data=0x55e4ac0)

at /home/mebauer/extern_legion/legion/runtime//realm/threads.cc:555

#15 0x00007f2d03700182 in start_thread (arg=0x7f2cfcfd2700) at pthread_create.c:312
#16 0x00007f2d02724fbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The event itself looks like a normal event (0x20001000000000be,0x9) and this assertion happened on node 1 out of a 2 node run.

Segmentation Fault when running "tester_io" across more than one process

This test uses โ€œtester_ioโ€ (updated in the DMA branch under tests/hdf_attach_subregion_parallel)
A partial fix was to change the index used to access the hdf_metadata_vec structure. There is still an issue in that the wrong HDFMemory is chosen from: machine.get_all_memories in realm/idx_impl.cc. A fix for this is to choose the HDFMemory whose kind is MemoryImpl::MKIND_HDF and not choose MemoryImpl::MKIND_REMOTE as follows below. I'll submit a pull request for master and dma branches in just a bit.

diff --git a/runtime/realm/idx_impl.cc b/runtime/realm/idx_impl.cc
index a29d408..b631b90 100644
--- a/runtime/realm/idx_impl.cc
+++ b/runtime/realm/idx_impl.cc
@@ -485,14 +485,26 @@ namespace Realm {
Machine machine = Machine::get_machine();
std::set mem;
machine.get_all_memories(mem);

  •  std::cout << "Node: " << gasnet_mynode() << " mem.size() is: " << mem.size() << std::endl;
    
    for(std::set::iterator it = mem.begin(); it != mem.end(); it++) {
  •    std::cout << "Node: " << gasnet_mynode() << " in Domain::create_hdf5_instance, looking at memory: "
    
  •              << &(*it) << " kind is : " << it->kind() << std::endl;
     if (it->kind() == Memory::HDF_MEM) {
       memory = *it;
    
  •      HDFMemory\* hdf_mem = (HDFMemory*) get_runtime()->get_memory_impl(memory);
    
  •      std::cout << "Node: " << gasnet_mynode() << " found matching HDFMemory: " << hdf_mem
    
  •                << " kind is: " << it->kind() << " HDFMemory.kind is: "
    
  •                << hdf_mem->kind << std::endl;
    
  •      if(hdf_mem->kind == MemoryImpl::MKIND_HDF)
    
  •        break; /\* this is usable, take it */ 
     }
    

(lldb) thread backtrace all
thread #1: tid = 0xec9ca, 0x00007fff8cbe848a libsystem_kernel.dylib__semwait_signal + 10, queue = 'com.apple.main-thread' frame #0: 0x00007fff8cbe848a libsystem_kernel.dylib__semwait_signal + 10
frame #1: 0x00007fff8d39af5d libsystem_c.dylibnanosleep + 199 frame #2: 0x00007fff8d39adbd libsystem_c.dylibsleep + 42
frame #3: 0x0000000107cf9bba tester_ioLegionRuntime::LowLevel::RuntimeImpl::run(this=0x00007fda8bc053a0, task_id=0, style=ONE_TASK_ONLY, args=0x0000000000000000, arglen=0, background=false) + 3930 at lowlevel.cc:10789 frame #4: 0x0000000107cf8c50 tester_ioRealm::Runtime::run(this=0x00007fff57f67428, task_id=0, style=ONE_TASK_ONLY, args=0x0000000000000000, arglen=0, background=false) + 80 at lowlevel.cc:9807
frame #5: 0x00000001083ab635 tester_ioLegionRuntime::HighLevel::Runtime::start(argc=2, argv=0x00007fff57f67d38, background=false) + 7797 at runtime.cc:15687 frame #6: 0x0000000107e4adfa tester_ioLegionRuntime::HighLevel::HighLevelRuntime::start(argc=4, argv=0x00007fff57f67d28, background=false) + 42 at legion.cc:3671
frame #7: 0x0000000107cb9816 tester_iomain(argc=4, argv=0x00007fff57f67d28) + 374 at tester_io.cc:244 frame #8: 0x00007fff93b0b5c9 libdyld.dylibstart + 1

thread #2: tid = 0xec9d5, 0x00007fff8cbe7c22 libsystem_kernel.dylib__ioctl + 10 frame #0: 0x00007fff8cbe7c22 libsystem_kernel.dylib__ioctl + 10
frame #1: 0x00007fff8cbe59b7 libsystem_kernel.dylibioctl + 159 frame #2: 0x00000001085a545f tester_ioAMUDP_DrainNetwork(amudp_ep*) + 63
frame #3: 0x00000001085a40a1 tester_ioAM_Poll + 97 frame #4: 0x000000010853b4e2 tester_iogasnetc_AMPoll + 50
frame #5: 0x0000000107dacab4 tester_iodo_some_polling() [inlined] gasneti_AMPoll + 5 at gasnet_help.h:597 frame #6: 0x0000000107dacaaf tester_iodo_some_polling() [inlined] gasnet_AMPoll at gasnet_help.h:712
frame #7: 0x0000000107dacaaf tester_iodo_some_polling() + 31 at activemsg.cc:2057 frame #8: 0x0000000107daced6 tester_iogasnet_poll_thread_loop(data=0x0000000000000000) + 38 at activemsg.cc:2073
frame #9: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131 frame #10: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176
frame #11: 0x00007fff98cf941d libsystem_pthread.dylib`thread_start + 13

thread #3: tid = 0xec9d7, 0x00007fff8cbe8136 libsystem_kernel.dylib__psynch_cvwait + 10 frame #0: 0x00007fff8cbe8136 libsystem_kernel.dylib__psynch_cvwait + 10
frame #1: 0x00007fff98cfbe0c libsystem_pthread.dylib_pthread_cond_wait + 693 frame #2: 0x0000000107dab843 tester_ioIncomingMessageManager::get_messages(this=0x00007fda8bd00a30, sender=0x0000000109120ea4, wait=true) + 131 at activemsg.cc:686
frame #3: 0x0000000107dab9c1 tester_ioLegionRuntime::LowLevel::HandlerThread::thread_main(this=0x00007fda8bd00b00) + 49 at activemsg.cc:725 frame #4: 0x0000000107ce5d1d tester_ioLegionRuntime::LowLevel::PreemptableThread::thread_entry(data=0x00007fda8bd00b00) + 173 at lowlevel.cc:6392
frame #5: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131 frame #6: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176
frame #7: 0x00007fff98cf941d libsystem_pthread.dylib`thread_start + 13

thread #4: tid = 0xec9d8, 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10 frame #0: 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10
frame #1: 0x00007fff98cf978a libsystem_pthread.dylib_pthread_mutex_lock + 480 frame #2: 0x0000000107d97e2b tester_ioLegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(this=0x00007fda8bd00e60, dma_thread=0x00007fda8bd00f10, wait_on_empty=false) + 2587 at channel.h:1101
frame #3: 0x0000000107d886a8 tester_ioLegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fda8bd00f10) + 1192 at channel.cc:1453 frame #4: 0x0000000107dde54d tester_ioLegionRuntime::LowLevel::DMAThread::start(arg=0x00007fda8bd00f10) + 29 at channel.h:980
frame #5: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131 frame #6: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176
frame #7: 0x00007fff98cf941d libsystem_pthread.dylib`thread_start + 13

thread #5: tid = 0xec9d9, 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10 frame #0: 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10
frame #1: 0x00007fff98cf978a libsystem_pthread.dylib_pthread_mutex_lock + 480 frame #2: 0x0000000107d886cb tester_ioLegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fda8bd01330) + 1227 at channel.cc:1455
frame #3: 0x0000000107dde54d tester_ioLegionRuntime::LowLevel::DMAThread::start(arg=0x00007fda8bd01330) + 29 at channel.h:980 frame #4: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131
frame #5: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176 frame #6: 0x00007fff98cf941d libsystem_pthread.dylibthread_start + 13

thread #6: tid = 0xec9da, 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10 frame #0: 0x00007fff8cbe8166 libsystem_kernel.dylib__psynch_mutexwait + 10
frame #1: 0x00007fff98cf978a libsystem_pthread.dylib_pthread_mutex_lock + 480 frame #2: 0x0000000107d97e2b tester_ioLegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(this=0x00007fda8bd00e60, dma_thread=0x00007fda8bd01700, wait_on_empty=false) + 2587 at channel.h:1101
frame #3: 0x0000000107d886a8 tester_ioLegionRuntime::LowLevel::DMAThread::dma_thread_loop(this=0x00007fda8bd01700) + 1192 at channel.cc:1453 frame #4: 0x0000000107dde54d tester_ioLegionRuntime::LowLevel::DMAThread::start(arg=0x00007fda8bd01700) + 29 at channel.h:980
frame #5: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131 frame #6: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176
frame #7: 0x00007fff98cf941d libsystem_pthread.dylib`thread_start + 13

  • thread #7: tid = 0xec9dc, 0x0000000107d8288c tester_io`LegionRuntime::LowLevel::HDFMemory::create_instance(Realm::IndexSpace, int const_, unsigned long, unsigned long, unsigned long, std::__1::vector<unsigned long, std::_1::allocator > const&, int, long long, Realm::ProfilingRequestSet const&, Realm::RegionInstance, char const, std::__1::vector<char const*, std::__1::allocator<char const*> > const&, Realm::Domain, bool) [inlined] std::__1::__vector_base<LegionRuntime::LowLevel::HDFMemory::HDFMetadata*, std::__1::allocatorLegionRuntime::LowLevel::HDFMemory::HDFMetadata* >::__alloc(this=0x00007fda8bd00d30, __a=0x00007fda8bd00d40, __p=0x0000000000000008, __a0=0x00000001298b19f8) + 50 at memory:1462, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)

    • frame #0: 0x0000000107d8288c tester_ioLegionRuntime::LowLevel::HDFMemory::create_instance(Realm::IndexSpace, int const*, unsigned long, unsigned long, unsigned long, std::__1::vector<unsigned long, std::__1::allocator<unsigned long> > const&, int, long long, Realm::ProfilingRequestSet const&, Realm::RegionInstance, char const*, std::__1::vector<char const*, std::__1::allocator<char const*> > const&, Realm::Domain, bool) [inlined] std::__1::__vector_base<LegionRuntime::LowLevel::HDFMemory::HDFMetadata*, std::__1::allocator<LegionRuntime::LowLevel::HDFMemory::HDFMetadata*> >::__alloc(this=0x00007fda8bd00d30, __a=0x00007fda8bd00d40, __p=0x0000000000000008, __a0=0x00000001298b19f8) + 50 at memory:1462 frame #1: 0x0000000107d8285a tester_ioLegionRuntime::LowLevel::HDFMemory::create_instance(Realm::IndexSpace, int const_, unsigned long, unsigned long, unsigned long, std::__1::vector<unsigned long, std::_1::allocator > const&, int, long long, Realm::ProfilingRequestSet const&, Realm::RegionInstance, char const, std::__1::vector<char const*, std::__1::allocator<char const*> > const&, Realm::Domain, bool) [inlined] std::__1::vector<LegionRuntime::LowLevel::HDFMemory::HDFMetadata*, std::__1::allocatorLegionRuntime::LowLevel::HDFMemory::HDFMetadata* >::push_back(this=0x00007fda8bd00d30, __x=0x00000001298b19f8) + 180 at vector:1538
      frame #2: 0x0000000107d827a6 tester_ioLegionRuntime::LowLevel::HDFMemory::create_instance(this=0x00007fda8bd00c90, is=(id = 0), linearization_bits=0x00000001298b2340, bytes_needed=128, block_size=16, element_size=8, field_sizes=0x00000001298b27d0, redopid=0, list_size=-1, reqs=0x00000001298b1f10, parent_inst=(id = 0), file=0x00007fda8be5e6a0, path_names=0x00000001298b2430, domain=<unavailable>, read_only=false) + 2550 at lowlevel_disk.cc:216 frame #3: 0x0000000107ced9d2 tester_ioRealm::Domain::create_hdf5_instance(this=0x00007fda8be0a0e8, file_name=0x00007fda8be5e6a0, field_sizes=0x00000001298b27d0, field_files=0x00000001298b2430, read_only=false) const + 3570 at lowlevel.cc:7467
      frame #4: 0x0000000107eb5a7a tester_ioLegionRuntime::HighLevel::AttachOp::create_instance(this=0x00007fda8be6a6e0, dom=0x00007fda8be0a0e8, sizes=0x00000001298b27d0) + 1418 at legion_ops.cc:8987 frame #5: 0x00000001080b5bbf tester_ioLegionRuntime::HighLevel::FieldSpaceNode::create_file_instance(this=0x00007fda8be04ec0, create_fields=0x00007fda8be6a808, attach_mask=0x00000001298b2a20, node=0x00007fda8d91e6a0, attach_op=0x00007fda8be6a6e0) + 223 at region_tree.cc:9228
      frame #6: 0x000000010807509c tester_ioLegionRuntime::HighLevel::RegionNode::attach_file(this=0x00007fda8d91e6a0, ctx=35, attach_mask=0x00000001298b2a20, req=0x00007fda8be6a7e8, attach_op=0x00007fda8be6a6e0) + 204 at region_tree.cc:17454 frame #7: 0x0000000108074f79 tester_ioLegionRuntime::HighLevel::RegionTreeForest::attach_file(this=0x00007fda8be002e0, ctx=(ctx = 35), req=0x00007fda8be6a7e8, attach_op=0x00007fda8be6a6e0) + 313 at region_tree.cc:2572
      frame #8: 0x0000000107eb53e7 tester_ioLegionRuntime::HighLevel::AttachOp::trigger_execution(this=0x00007fda8be6a6e0) + 391 at legion_ops.cc:8952 frame #9: 0x00000001083b175f tester_ioLegionRuntime::HighLevel::Runtime::high_level_runtime_task(args=0x00007fda8d824260, arglen=20, p=(id = 2147483648)) + 911 at runtime.cc:16672
      frame #10: 0x0000000107ce53f7 tester_ioLegionRuntime::LowLevel::PreemptableThread::run_task(this=0x00007fda8bf00650, task=0x00007fda8d8006e0, actual_proc=(id = 2147483648)) + 231 at lowlevel.cc:6344 frame #11: 0x0000000107ce623d tester_ioLegionRuntime::LowLevel::GreenletTask::run(this=0x00007fda8bc36380, arg=0x00007fda8bf00650) + 61 at lowlevel.cc:6441
      frame #12: 0x0000000107e1adc9 tester_iogreenlet::_run(arg=0x00007fda8bf00650) + 57 at greenlet-cc.cc:158 frame #13: 0x0000000107e1a84e tester_io_greenlet_start(arg=0x00007fda8bc28460) + 110 at greenlet.cc:122

    thread #8: tid = 0xec9df, 0x00007fff8cbe8136 libsystem_kernel.dylib__psynch_cvwait + 10 frame #0: 0x00007fff8cbe8136 libsystem_kernel.dylib__psynch_cvwait + 10
    frame #1: 0x00007fff98cfbe0c libsystem_pthread.dylib_pthread_cond_wait + 693 frame #2: 0x0000000107cfda1d tester_ioGASNetCondVar::wait(this=0x00007fda8bf002e0) + 45 at activemsg.h:124
    frame #3: 0x0000000107ce688b tester_ioLegionRuntime::LowLevel::GreenletProcessor::execute_task(this=0x00007fda8bf00270) + 411 at lowlevel.cc:6650 frame #4: 0x0000000107ce66b7 tester_ioLegionRuntime::LowLevel::GreenletThread::thread_main(this=0x00007fda8d900000) + 55 at lowlevel.cc:6467
    frame #5: 0x0000000107ce5d1d tester_ioLegionRuntime::LowLevel::PreemptableThread::thread_entry(data=0x00007fda8d900000) + 173 at lowlevel.cc:6392 frame #6: 0x00007fff98cfb268 libsystem_pthread.dylib_pthread_body + 131
    frame #7: 0x00007fff98cfb1e5 libsystem_pthread.dylib_pthread_start + 176 frame #8: 0x00007fff98cf941d libsystem_pthread.dylibthread_start + 13

Here is a bit of info from the debugger, note that the HDFMemory thinks it is โ€œremoteโ€
frame #2: 0x0000000107d827a6 tester_io`LegionRuntime::LowLevel::HDFMemory::create_instance(this=0x00007fda8bd00c90, is=(id = 0), linearization_bits=0x00000001298b2340, bytes_needed=128, block_size=16, element_size=8, field_sizes=0x00000001298b27d0, redopid=0, list_size=-1, reqs=0x00000001298b1f10, parent_inst=(id = 0), file=0x00007fda8be5e6a0, path_names=0x00000001298b2430, domain=, read_only=false) + 2550 at lowlevel_disk.cc:216
213
pthread_rwlock_unlock(&this->rwlock);
214
return inst;
215
}
-> 216
217
void HDFMemory::destroy_instance(RegionInstance i,
218
bool local_destroy)
219
{
(lldb) p this
(LegionRuntime::LowLevel::HDFMemory *) $3 = 0x00007fda8bd00c90
(lldb) p *this
(LegionRuntime::LowLevel::HDFMemory) $4 = {
LegionRuntime::LowLevel::MemoryImpl = {
me = (id = 1619066880)
size = 0
kind =
alignment = 0
lowlevel_kind = HDF_MEM
mutex = {
mutex = {
lock = (__sig = 1297437784, __opaque = "")
}
}
instances = size=2 {
[0] = 0x00007fda8bd7e280
[1] = 0x00007fda8d83ba80
}
free_blocks = size=0 {}
}
hdf_metadata_vec = size=0 {}
rwlock = (__sig = 4294967396, __opaque = "")
}

Assertions Triggering in legion_prof.py

Hi,

I'm trying to generate some Legion profiles using Legion Prof and am seeing the following assertions:

:>> /Users/samuel/devel/legion/tools/legion_prof.py -p OUT
Loading log file OUT...
Traceback (most recent call last):
File "/Users/samuel/devel/legion/tools/legion_prof.py", line 1630, in
main()
File "/Users/samuel/devel/legion/tools/legion_prof.py", line 1602, in main
total_matches = parse_log_file(file_name, state)
File "/Users/samuel/devel/legion/tools/legion_prof.py", line 1502, in parse_log_file
time = long(m.group('time')))
File "/Users/samuel/devel/legion/tools/legion_prof.py", line 1338, in create_instance
redop, bf, time)
File "/Users/samuel/devel/legion/tools/legion_prof.py", line 872, in set_create
assert self.blocking_factor is None or self.blocking_factor == blocking_factor
AssertionError

Then, I start commenting out the assertions and finally get the expected output.

I'll be happy to create a more detailed report if need be.

My code is compiled with:
OUTPUT_LEVEL=LEVEL_DEBUG
-DLEGION_PROF -DMAX_FIELDS=64 -DFIELD_LOG2=6

Thanks,

Sam

segfault running partitioning example

Running the 07_partitioning example I am getting a segfault. Here is a back trace:

eduroam-255-150:07_partitioning nwatkins$ lldb ./partitioning
Current executable set to './partitioning' (x86_64).
(lldb) run
Process 14962 launched: './partitioning' (x86_64)
Running daxpy for 1024 elements...
Partitioning data into 4 sub-regions...
Initializing field 0 for block 2...
Initializing field 0 for block 1...
Initializing field 0 for block 3...
Initializing field 0 for block 0...
Initializing field 1 for block 0...
Initializing field 1 for block 1...
Process 14962 stopped
* thread #1: tid = 0xf81c8, 0x000000010001a000 partitioning`LegionRuntime::LowLevel::Runtime::get_reservation_impl(this=0x0000000101001200, r=Reservation at 0x00007fff5fbfdae8) + 352 at shared_lowlevel.cc:4683, queue = 'com.apple.main-thread, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010001a000 partitioning`LegionRuntime::LowLevel::Runtime::get_reservation_impl(this=0x0000000101001200, r=Reservation at 0x00007fff5fbfdae8) + 352 at shared_lowlevel.cc:4683
   4680         assert(r.id != 0);
   4681         assert(r.id < reservations.size());
   4682 #endif
-> 4683         ReservationImpl *result = reservations[r.id];
   4684         PTHREAD_SAFE_CALL(pthread_rwlock_unlock(&reservation_lock));
   4685         return result;
   4686     }
  thread #3: tid = 0xf82b0, 0x000000010001a000 partitioning`LegionRuntime::LowLevel::Runtime::get_reservation_impl(this=0x0000000101001200, r=Reservation at 0x0000000100e00948) + 352 at shared_lowlevel.cc:4683, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x000000010001a000 partitioning`LegionRuntime::LowLevel::Runtime::get_reservation_impl(this=0x0000000101001200, r=Reservation at 0x0000000100e00948) + 352 at shared_lowlevel.cc:4683
   4680         assert(r.id != 0);
   4681         assert(r.id < reservations.size());
   4682 #endif
-> 4683         ReservationImpl *result = reservations[r.id];
   4684         PTHREAD_SAFE_CALL(pthread_rwlock_unlock(&reservation_lock));
   4685         return result;
   4686     }
(lldb) 

segment fault in activemsg

I experienced a segment fault when I ran dma_random test. The issue is that ContiguousPayload instance tries to free a mem segment with wrong address. The back trace is as follows. (the line numbers refer to dma branch)

[0] Thread 8 (Thread 0x7f4cdd6f7700 (LWP 32548)):
[0] #0 0x00007f4d3acd2619 in __libc_waitpid (pid=32556, stat_loc=stat_loc@entry=0x7f4cdd6f4390, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:40
[0] #1 0x00007f4d3ac571d2 in do_system (line=) at ../sysdeps/posix/system.c:148
[0] #2 0x0000000000d038a8 in gasneti_bt_gdb ()
[0] #3 0x0000000000d0670b in gasneti_print_backtrace ()
[0] #4 0x0000000000d5f241 in gasneti_defaultSignalHandler ()
[0] #5 signal handler called
[0] #6 0x00007f4d3a557471 in opal_memory_ptmalloc2_int_free () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #7 0x00007f4d3a557bb3 in opal_memory_ptmalloc2_free () from /usr/local/openmpi-1.8.2/lib/libopen-pal.so.6
[0] #8 0x00000000009a59f6 in ContiguousPayload::copy_data (this=0x1a1e8a0, dest=0x7f4d30afcf40) at ../../runtime//activemsg.cc:2341
[0] #9 0x00000000009a43cc in OutgoingMessage::reserve_srcdata (this=0x1a45f40) at ../../runtime//activemsg.cc:1735
[0] #10 0x00000000009a73d5 in ActiveMessageEndpoint::enqueue_message (this=0x1a0d640, hdr=0x1a45f40, in_order=true) at ../../runtime//activemsg.cc:1080
[0] #11 0x00000000009aaf60 in EndpointManager::enqueue_message (this=0x1a0d1a0, target=1, hdr=0x1a45f40, in_order=true) at ../../runtime//activemsg.cc:1870
[0] #12 0x00000000009a5659 in enqueue_message (target=1, msgid=170, args=0x7f4cdd6f6d60, arg_size=24, payload=0x1a46060, payload_size=172, payload_mode=2, dstptr=0x0) at ../../runtime//activemsg.cc:2256
[0] #13 0x00000000009e257c in ActiveMessageMediumNoReply<170, Realm::MetadataResponseMessage::RequestArgs, &Realm::MetadataResponseMessage::handle_request>::request (dest=1, args=..., data=0x1a46060, datalen=172, payload_mode=2, dstptr=0x0) at ../../runtime/activemsg.h:595
[0] #14 0x00000000009e1daa in Realm::MetadataResponseMessage::send_request (target=1, id=3766419464, data=0x1a46060, datalen=172, payload_mode=2) at ../../runtime//realm/metadata.cc:297
[0] #15 0x00000000009e1c63 in Realm::MetadataRequestMessage::handle_request (args=...) at ../../runtime//realm/metadata.cc:248
[0] #16 0x0000000000a39176 in IncomingShortMessage<Realm::MetadataRequestMessage::RequestArgs, 169, &Realm::MetadataRequestMessage::handle_request, 2>::run_handler (this=0x7f4d33a0df20) at ../../runtime/activemsg.h:331
[0] #17 0x00000000009a3e93 in Realm::HandlerThread::thread_main (this=0x1a0e5a0) at ../../runtime//activemsg.cc:741
[0] #18 0x00000000009f9927 in Realm::PreemptableThread::thread_entry (data=0x1a0e5a0) at ../../runtime//realm/proc_impl.cc:1039
[0] #19 0x00007f4d3b9e1182 in start_thread (arg=0x7f4cdd6f7700) at pthread_create.c:312
[0] #20 0x00007f4d3ad0bfbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Corrupted data in S3D

With stencils enabled, S3D crashes after a small-but-variable number of timesteps
with bad temperature data. This bug has only been reproduced on Titan so far,
and was introduced some time after 7b73e22.
It may be a duplicate of issue #17 and is suspected to be related to the changes to
the handling of restricted instances.

Locally mapped but remote execution for non-leaf task bug

Running the C++ miniaero (non-spmd version) with a locally mapping mapper gives me the following assertion failure:

mini-Aero.exe: /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620: void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int): Assertion `complete_points <= total_points' failed.

Here is the stacktrace:

(gdb) bt
#0 0x00007fb6f57f39bd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007fb6f57f3854 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2 0x0000000000cbf3f8 in Realm::realm_freeze (signal=6) at /home/wclee/Workspace/legion/runtime/realm/runtime_impl.cc:75
#3
#4 0x00007fb6f5768bb9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#5 0x00007fb6f576bfc8 in __GI_abort () at abort.c:89
#6 0x00007fb6f5761a76 in __assert_fail_base (fmt=0x7fb6f58b3370 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0xe18998 "complete_points <= total_points", file=file@entry=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=line@entry=9620,

function=function@entry=0xe1f480 <LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)::__PRETTY_FUNCTION__> "void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)") at assert.c:92

#7 0x00007fb6f5761b22 in __GI___assert_fail (assertion=0xe18998 "complete_points <= total_points", file=0xe16200 "/home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc", line=9620,

function=0xe1f480 <LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)::__PRETTY_FUNCTION__> "void LegionRuntime::HighLevel::IndexTask::return_slice_complete(unsigned int)") at assert.c:101

#8 0x0000000000a834d8 in LegionRuntime::HighLevel::IndexTask::return_slice_complete (this=0x334aba0, points=1) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9620
#9 0x0000000000a83b23 in LegionRuntime::HighLevel::IndexTask::unpack_slice_complete (this=0x334aba0, derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9738
#10 0x0000000000a83c5a in LegionRuntime::HighLevel::IndexTask::process_slice_complete (derez=...) at /home/wclee/Workspace/legion/runtime/legion/legion_tasks.cc:9767
#11 0x0000000000c22cca in LegionRuntime::HighLevel::Runtime::handle_slice_remote_complete (this=0x331fd80, derez=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:11672
#12 0x0000000000c08cdb in LegionRuntime::HighLevel::VirtualChannel::handle_messages (this=0x7fb0b11108e0, num_messages=6, runtime=0x331fd80, remote_address_space=1, args=0x7fb0a054a604 "\240\253\064\003", arglen=160) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3352
#13 0x0000000000c08871 in LegionRuntime::HighLevel::VirtualChannel::process_message (this=0x7fb0b11108e0, args=0x7fb0a054a1cc, arglen=1232, runtime=0x331fd80, remote_address_space=1) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3145
#14 0x0000000000c09789 in LegionRuntime::HighLevel::MessageManager::receive_message (this=0x7fb0bb3fad00, args=0x7fb0a054a1c8, arglen=1240) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:3771
#15 0x0000000000c23999 in LegionRuntime::HighLevel::Runtime::process_message_task (this=0x331fd80, args=0x7fb0a054a1c4, arglen=1244) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:12179
#16 0x0000000000c2d39c in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (args=0x7fb0a054a1c0, arglen=1248, p=...) at /home/wclee/Workspace/legion/runtime/legion/runtime.cc:15795
#17 0x0000000000d207e4 in Realm::Task::execute_on_processor (this=0x7fb0a05d25c0, p=...) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:80
#18 0x0000000000d22a66 in Realm::UserThreadTaskScheduler::execute_task (this=0x3306c20, task=0x7fb0a05d25c0) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:887
#19 0x0000000000d21527 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/tasks.cc:448
#20 0x0000000000d2625e in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x3306c20) at /home/wclee/Workspace/legion/runtime/realm/threads.inl:127
#21 0x0000000000d0e335 in Realm::UserThread::uthread_entry () at /home/wclee/Workspace/legion/runtime/realm/threads.cc:740
#22 0x00007fb6f577b7a0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()

Need get_index_subspace<3> fixes applied to s3d branch

S3D uses 3-D "colors" for index subspaces. This was broken with the merge of the
new partitioning API code and fixed fairly recently in the master branch. Unfortunately,
the master branch contains other issues that break S3D. To help with that debug, the
fixes for this issue need to be manually applied (they do not merge cleanly) to the s3d
branch. Without them, the failure looks like this:

s3d.x: /autofs/nccs-svm1_home1/seant/s3d/legion_s3d/legion/runtime/lowlevel.h:82
1: LegionRuntime::Arrays::Point LegionRuntime::LowLevel::DomainPoint::get_p
oint() const [with int DIM = 3]: Assertion `dim == DIM' failed.
*** Caught a fatal signal: SIGABRT(6) on node 0/1
[0] 6: /lib64/libc.so.6(__assert_fail+0xf0) [0x2aaaadd56740]
[0] 7: ../bin/librhsf.so(ZN13LegionRuntime9HighLevel16RegionTreeForest22create
index_partitionENS0_14IndexPartitionENS0_10IndexSpaceENS0_10ColorPointERKSt3mapI
NS_8LowLevel11DomainPointENS6_6DomainESt4lessIS7_ESaISt4pairIKS7_S8_EEERKS8_23le
gion_partition_kind_t22legion_allocate_mode_t+0x562) [0x2aaaab02f0f2]
[0] 8: ../bin/librhsf.so(_ZN13LegionRuntime9HighLevel7Runtime22create_index_part
itionEPNS0_10SingleTaskENS0_10IndexSpaceENS_8LowLevel6DomainERKSt3mapIjS6_St4les
sIjESaISt4pairIKjS6_EEEbi+0x3d5) [0x2aaaab0927a5]
[0] 9: ../bin/librhsf.so(ZN13LegionRuntime9HighLevel16HighLevelRuntime22create
index_partitionEPNS0_10SingleTaskENS0_10IndexSpaceENS_8LowLevel6DomainERKSt3mapI
jS6_St4lessIjESaISt4pairIKjS6_EEEbi+0x91) [0x2aaaaaf6d5c1]
[0] 10: ../bin/librhsf.so(_ZN13LegionRuntime9HighLevel16HighLevelRuntime22create
_index_partitionINS_6Arrays8BlockifyILj3EEEEENS0_14IndexPartitionEPNS0_10SingleT
askENS0_10IndexSpaceERKT_i+0x49c) [0x2aaaaae676fc]
[0] 11: ../bin/librhsf.so(_Z14top_level_taskPKN13LegionRuntime9HighLevel4TaskERK
St6vectorINS0_14PhysicalRegionESaIS5_EEPNS0_10SingleTaskEPNS0_16HighLevelRuntime
E+0x431) [0x2aaaaae5d049]

dma: assertion failure in multiple_partitions example

nwatkins@node0:~/src/legion/examples/08_multiple_partitions$ GASNET_BACKTRACE=1 SSH_SERVERS="node0" amudprun -np 1 ./multiple_partitions
Pinned Memory Usage: GASNET=256, RMEM=0, LMB=2, SDP=64, total=322
multiple_partitions: /users/nwatkins/src/legion/runtime/channel.h:1085: void LegionRuntime::LowLevel::XferDesQueue::enqueue_xferDes(LegionRuntime::LowLevel::XferDes*): Assertion `it != channel_to_dma_thread.end()' failed.
Running stencil computation for 1024 elements...
Partitioning data into 4 sub-regions...
Initializing field 0 for block 0...
Initializing field 0 for block 1...
Initializing field 0 for block 2...
Initializing field 0 for block 3...
*** Caught a fatal signal: SIGABRT(6) on node 0/1

Legion spy and legion disagree on cases where partition disjointness can be detected dynamically

I re-ran the randomized tests with the restriction that all region requirements have exclusive coherence and read write access. The number of errors reported by legion spy actually increased slightly over the tests with random privileges and coherences.

After looking through the tests for a simple case and presenting it to Sean, Wonchan, and Elliot it seems that the problem is that legion can dynamically detect when 2 partitions of an index space are disjoint, but legion spy does not realize this. I have attached a simple case that legion spy reports an error on when 2 partitions of the top level index space are disjoint, but which legion spy and legion agree on when the 2 partitions are overlapping.

Below is a link to a gist containing a (somewhat) simple test case and a fixed version of the same case:

https://gist.github.com/dillonhuff/19c4f57a73b1fc4a8c02

Legion Prof: Missing Measurements

@streichler I'm seeing the following failed assertion get triggered due to missing measurements.

OS: OS X
Where: Master branch

./cgsolver -hl:prof 1 -cat legion_prof -level 2
# some output...
Ax = A * x is done.
r = b - Ax is done.
Assertion failed: (response.has_measurement< Realm::ProfilingMeasurements::OperationMemoryUsage>()), function process_results, file /Users/samuel/devel/legion/runtime/legion_profiling.cc, line 552.
Abort trap: 6

Assertion failure: in lowlevel_dma.cc:2775 - Assertion `dist.find(dst_mem) != dist.end()' failed.

When running tester_io on 4 nodes:
setenv GASNET_BACKTRACE 1
${HOME}/local/bin/gasnetrun_ibv -n 4 ./tester_io -n 125000000 -s 125 -r 3 -ll:cpu 4 -ll:dma 3

I get the following assertion failure on the current DMA branch

tester_io: /users/gshipman/local/src/legion-2/runtime//lowlevel_dma.cc:2775: void LegionRuntime::LowLevel::find_shortest_path(LegionRuntime::LowLevel::Memory, LegionRuntime::LowLevel::Memory, std::vector<Realm::Memory, std::allocatorRealm::Memory >&): Assertion dist.find(dst_mem) != dist.end()' failed. *** Caught a fatal signal: SIGABRT(6) on node 1/4 tester_io: /users/gshipman/local/src/legion-2/runtime//activemsg.cc:374: void SrcDataPool::release_srcptr(void*): Assertionit != in_use.end()' failed.
* Caught a fatal signal: SIGABRT(6) on node 0/4
tester_io: /users/gshipman/local/src/legion-2/runtime//lowlevel_dma.cc:2775: void LegionRuntime::LowLevel::find_shortest_path(LegionRuntime::LowLevel::Memory, LegionRuntime::LowLevel::Memory, std::vector<Realm::Memory, std::allocatorRealm::Memory >&): Assertion `dist.find(dst_mem) != dist.end()' failed.
* Caught a fatal signal: SIGABRT(6) on node 2/4
[0] /usr/bin/gstack 65423
[0] Thread 14 (Thread 0x2ab91208e700 (LWP 65424)):
[0] #0 0x00002ab90d26f373 in select () from /lib64/libc.so.6
[0] #1 0x00002ab90b30365f in service_thread_start () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[0] #2 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #3 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 13 (Thread 0x2ab9128c3700 (LWP 65425)):
[0] #0 0x00002ab90d26d0d3 in poll () from /lib64/libc.so.6
[0] #1 0x00002ab90b301ef0 in btl_openib_async_thread () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[0] #2 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #3 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 12 (Thread 0x2ab913b17700 (LWP 65427)):
[0] #0 0x00002ab90d25d287 in sched_yield () from /lib64/libc.so.6
[0] #1 0x0000000000d3afbe in gasneti_bt_gstack ()
[0] #2 0x0000000000d3c0ff in gasneti_print_backtrace ()
[0] #3 0x0000000000d9fb4e in gasneti_defaultSignalHandler ()
[0] #4
[0] #5 0x00002ab90d1c0625 in raise () from /lib64/libc.so.6
[0] #6 0x00002ab90d1c1e05 in abort () from /lib64/libc.so.6
[0] #7 0x00002ab90d1b974e in assert_fail_base () from /lib64/libc.so.6
[0] #8 0x00002ab90d1b9810 in assert_fail () from /lib64/libc.so.6
[0] #9 0x00000000009aef25 in SrcDataPool::release_srcptr(void
) ()
[0] #10 0x00000000009b1813 in SrcDataPool::release_srcptr_handler(void
, int, int) ()
[0] #11 0x0000000000d24cfc in gasnetc_rcv_reap ()
[0] #12 0x0000000000d26a8f in gasnetc_RequestGeneric ()
[0] #13 0x0000000000d159f8 in gasnetc_AMRequestLongAsyncM ()
[0] #14 0x00000000009b657f in ActiveMessageEndpoint::send_long(OutgoingMessage
, void
) ()
[0] #15 0x00000000009b31ba in ActiveMessageEndpoint::push_messages(int, bool) ()
[0] #16 0x00000000009b6e7d in EndpointManager::push_messages(int, bool) ()
[0] #17 0x00000000009b0c2f in do_some_polling() ()
[0] #18 0x00000000009b0d2c in gasnet_poll_thread_loop(void_) ()
[0] #19 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #20 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 11 (Thread 0x2ab913d18700 (LWP 65428)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x00000000009afb74 in IncomingMessageManager::get_messages(int&, bool) ()
[0] #2 0x00000000009afcfd in Realm::HandlerThread::thread_main() ()
[0] #3 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[0] #4 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #5 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 10 (Thread 0x2ab933f1c700 (LWP 65429)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[0] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[0] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[0] #4 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #5 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 9 (Thread 0x2ab93411d700 (LWP 65430)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[0] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[0] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[0] #4 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #5 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 8 (Thread 0x2ab93431e700 (LWP 65431)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[0] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[0] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[0] #4 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #5 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 7 (Thread 0x2ab93451f700 (LWP 65432)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[0] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[0] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[0] #4 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #5 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 6 (Thread 0x2ab934720700 (LWP 65433)):
[0] #0 0x00002ab90ac00264 in lll_lock_wait () from /lib64/libpthread.so.0
[0] #1 0x00002ab90abfb508 in L_lock_854 () from /lib64/libpthread.so.0
[0] #2 0x00002ab90abfb3d7 in pthread_mutex_lock () from /lib64/libpthread.so.0
[0] #3 0x00000000009b2872 in SrcDataPool::Lock::Lock(SrcDataPool&) ()
[0] #4 0x00000000009b0103 in OutgoingMessage::reserve_srcdata() ()
[0] #5 0x00000000009b3389 in ActiveMessageEndpoint::enqueue_message(OutgoingMessage
, bool) ()
[0] #6 0x00000000009b6fcb in EndpointManager::enqueue_message(unsigned short, OutgoingMessage
, bool) ()
[0] #7 0x00000000009b152b in enqueue_message(unsigned short, int, void const
, unsigned long, void const_, unsigned long, int, void_) ()
[0] #8 0x0000000000a0d363 in ActiveMessageMediumNoReply<142, Realm::SpawnTaskMessage::RequestArgs, &(Realm::SpawnTaskMessage::handle_request(Realm::SpawnTaskMessage::RequestArgs, void const_, unsigned long))>::request(unsigned short, Realm::SpawnTaskMessage::RequestArgs&, void const_, unsigned long, int, void_) ()
[0] #9 0x0000000000a0803c in Realm::SpawnTaskMessage::send_request(unsigned short, Realm::Processor, unsigned int, void const_, unsigned long, Realm::ProfilingRequestSet const_, Realm::Event, Realm::Event, int) ()
[0] #10 0x0000000000a08359 in Realm::RemoteProcessor::spawn_task(unsigned int, void const_, unsigned long, Realm::Event, Realm::Event, int) ()
[0] #11 0x0000000000a07051 in Realm::Processor::spawn(unsigned int, void const_, unsigned long, Realm::Event, int) const ()
[0] #12 0x0000000000c74c72 in LegionRuntime::HighLevel::Runtime::issue_runtime_meta_task(void const_, unsigned long, LegionRuntime::HighLevel::HLRTaskID, LegionRuntime::HighLevel::Operation_, Realm::Event, int, Realm::Processor) ()
[0] #13 0x0000000000c5693d in LegionRuntime::HighLevel::MessageManager::send_message(bool) ()
[0] #14 0x0000000000c56840 in LegionRuntime::HighLevel::MessageManager::package_message(LegionRuntime::HighLevel::Serializer&, LegionRuntime::HighLevel::MessageManager::MessageKind, bool) ()
[0] #15 0x0000000000c55b64 in LegionRuntime::HighLevel::MessageManager::send_add_distributed_remote(LegionRuntime::HighLevel::Serializer&, bool) ()
[0] #16 0x0000000000c70966 in LegionRuntime::HighLevel::Runtime::send_add_distributed_remote(unsigned int, LegionRuntime::HighLevel::Serializer&) ()
[0] #17 0x0000000000ce991c in LegionRuntime::HighLevel::DistributedCollectable::send_remote_reference(unsigned int, unsigned int) ()
[0] #18 0x0000000000b41809 in LegionRuntime::HighLevel::RegionTreeForest::send_remote_references(std::map<LegionRuntime::HighLevel::LogicalView, LegionRuntime::HighLevel::AVXTLBitMask<512u>, std::lessLegionRuntime::HighLevel::LogicalView, LegionRuntime::HighLevel::AlignedAllocator<std::pair<LegionRuntime::HighLevel::LogicalView const, LegionRuntime::HighLevel::AVXTLBitMask<512u> > > > const&, std::set<LegionRuntime::HighLevel::PhysicalManager, std::lessLegionRuntime::HighLevel::PhysicalManager*, std::allocatorLegionRuntime::HighLevel::PhysicalManager* > const&, unsigned int) ()
[0] #19 0x0000000000ae1433 in LegionRuntime::HighLevel::IndexTask::send_remote_state(unsigned int, std::vector<unsigned int, std::allocator > const&, std::vector<unsigned int, std::allocator > const&, std::vector<unsigned int, std::allocator > const&) ()
[0] #20 0x0000000000ae175b in LegionRuntime::HighLevel::IndexTask::handle_slice_request(LegionRuntime::HighLevel::Runtime_, LegionRuntime::HighLevel::Deserializer&, unsigned int) ()
[0] #21 0x0000000000c7235f in LegionRuntime::HighLevel::Runtime::handle_slice_request(LegionRuntime::HighLevel::Deserializer&, unsigned int) ()
[0] #22 0x0000000000c57294 in LegionRuntime::HighLevel::MessageManager::handle_messages(unsigned int, char const_, unsigned long) ()
[0] #23 0x0000000000c569ff in LegionRuntime::HighLevel::MessageManager::process_message(void const_, unsigned long) ()
[0] #24 0x0000000000c72a5d in LegionRuntime::HighLevel::Runtime::process_message_task(void const_, unsigned long) ()
[0] #25 0x0000000000c7e154 in LegionRuntime::HighLevel::Runtime::high_level_runtime_task(void const_, unsigned long, Realm::Processor) ()
[0] #26 0x0000000000a09fe8 in Realm::PreemptableThread::run_task(Realm::Task_, Realm::Processor) ()
[0] #27 0x0000000000a0a420 in Realm::GreenletTask::run(void_) ()
[0] #28 0x0000000000a50100 in greenlet::run(void) ()
[0] #29 0x0000000000a4f85c in greenlet_start(void) ()
[0] #30 0x0000000000000000 in ?? ()
[0] Thread 5 (Thread 0x2ab934921700 (LWP 65434)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[0] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[0] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[0] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[0] #5 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #6 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 4 (Thread 0x2ab934b22700 (LWP 65435)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[0] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[0] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[0] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[0] #5 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #6 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 3 (Thread 0x2ab935000700 (LWP 65436)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[0] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[0] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[0] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[0] #5 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #6 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 2 (Thread 0x2ab935201700 (LWP 65437)):
[0] #0 0x00002ab90abfd5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[0] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[0] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[0] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[0] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[0] #5 0x00002ab90abf99d1 in start_thread () from /lib64/libpthread.so.0
[0] #6 0x00002ab90d2768fd in clone () from /lib64/libc.so.6
[0] Thread 1 (Thread 0x2ab911c5d1e0 (LWP 65423)):
[0] #0 0x00002ab90d23aa3d in nanosleep () from /lib64/libc.so.6
[0] #1 0x00002ab90d23a8b0 in sleep () from /lib64/libc.so.6
[0] #2 0x0000000000a412c7 in Realm::RuntimeImpl::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[0] #3 0x0000000000a3e00c in Realm::Runtime::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[0] #4 0x0000000000c7be7f in LegionRuntime::HighLevel::Runtime::start(int, char__, bool) ()
[0] #5 0x0000000000a623cf in LegionRuntime::HighLevel::HighLevelRuntime::start(int, char__, bool) ()
[0] #6 0x000000000097b33b in main ()
[2] /usr/bin/gstack 98078
[2] Thread 14 (Thread 0x2b35a9d6c700 (LWP 98079)):
[2] #0 0x00002b35a4f4d373 in select () from /lib64/libc.so.6
[2] #1 0x00002b35a2fe165f in service_thread_start () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[2] #2 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #3 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 13 (Thread 0x2b35aa5a1700 (LWP 98080)):
[2] #0 0x00002b35a4f4b0d3 in poll () from /lib64/libc.so.6
[2] #1 0x00002b35a2fdfef0 in btl_openib_async_thread () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[2] #2 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #3 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 12 (Thread 0x2b35ab816700 (LWP 98082)):
[2] #0 0x00002b35a4f3b287 in sched_yield () from /lib64/libc.so.6
[2] #1 0x0000000000d26931 in gasnetc_RequestGeneric ()
[2] #2 0x0000000000d15d72 in gasnetc_AMRequestShortM ()
[2] #3 0x00000000009b3d7f in ActiveMessageEndpoint::send_short(OutgoingMessage_) ()
[2] #4 0x00000000009b2f1e in ActiveMessageEndpoint::push_messages(int, bool) ()
[2] #5 0x00000000009b6e7d in EndpointManager::push_messages(int, bool) ()
[2] #6 0x00000000009b0c2f in do_some_polling() ()
[2] #7 0x00000000009b0d2c in gasnet_poll_thread_loop(void_) ()
[2] #8 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #9 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 11 (Thread 0x2b35aba17700 (LWP 98083)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x00000000009afb74 in IncomingMessageManager::get_messages(int&, bool) ()
[2] #2 0x00000000009afcfd in Realm::HandlerThread::thread_main() ()
[2] #3 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[2] #4 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #5 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 10 (Thread 0x2b35cbe00700 (LWP 98084)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[2] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[2] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[2] #4 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #5 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 9 (Thread 0x2b35cc001700 (LWP 98085)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[2] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[2] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[2] #4 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #5 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 8 (Thread 0x2b35cc202700 (LWP 98086)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[2] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[2] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[2] #4 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #5 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 7 (Thread 0x2b35cc403700 (LWP 98087)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[2] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[2] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[2] #4 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #5 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 6 (Thread 0x2b35cc604700 (LWP 98088)):
[2] #0 0x00002b35a4f3b287 in sched_yield () from /lib64/libc.so.6
[2] #1 0x0000000000d3afbe in gasneti_bt_gstack ()
[2] #2 0x0000000000d3c0ff in gasneti_print_backtrace ()
[2] #3 0x0000000000d9fb4e in gasneti_defaultSignalHandler ()
[2] #4
[2] #5 0x00002b35a4e9e625 in raise () from /lib64/libc.so.6
[2] #6 0x00002b35a4e9fe05 in abort () from /lib64/libc.so.6
[2] #7 0x00002b35a4e9774e in assert_fail_base () from /lib64/libc.so.6
[2] #8 0x00002b35a4e97810 in assert_fail () from /lib64/libc.so.6
[2] #9 0x00000000009c0383 in LegionRuntime::LowLevel::find_shortest_path(Realm::Memory, Realm::Memory, std::vector<Realm::Memory, std::allocatorRealm::Memory >&) ()
[2] #10 0x00000000009cfc39 in void LegionRuntime::LowLevel::CopyRequest::perform_new_dma<3u>(Realm::Memory, Realm::Memory) ()
[2] #11 0x00000000009c0619 in LegionRuntime::LowLevel::CopyRequest::perform_dma() ()
[2] #12 0x00000000009bec95 in LegionRuntime::LowLevel::CopyRequest::check_readiness(bool, LegionRuntime::LowLevel::DmaRequestQueue
) ()
[2] #13 0x00000000009c5251 in Realm::Domain::copy(std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::ProfilingRequestSet const&, Realm::Event, int, bool) const ()
[2] #14 0x00000000009c4acd in Realm::Domain::copy(std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::Event, int, bool) const ()
[2] #15 0x0000000000b46163 in LegionRuntime::HighLevel::RegionTreeForest::issue_copy(Realm::Domain const&, LegionRuntime::HighLevel::Operation
, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::Event) ()
[2] #16 0x0000000000b3fdfc in LegionRuntime::HighLevel::RegionTreeForest::copy_across(LegionRuntime::HighLevel::Operation
, LegionRuntime::HighLevel::RegionTreeContext, LegionRuntime::HighLevel::RegionTreeContext, LegionRuntime::HighLevel::RegionRequirement const&, LegionRuntime::HighLevel::RegionRequirement const&, LegionRuntime::HighLevel::InstanceRef const&, LegionRuntime::HighLevel::InstanceRef const&, Realm::Event) ()
[2] #17 0x0000000000a84b4f in LegionRuntime::HighLevel::CopyOp::trigger_execution() ()
[2] #18 0x0000000000c7e448 in LegionRuntime::HighLevel::Runtime::high_level_runtime_task(void const
, unsigned long, Realm::Processor) ()
[2] #19 0x0000000000a09fe8 in Realm::PreemptableThread::run_task(Realm::Task_, Realm::Processor) ()
[2] #20 0x0000000000a0a420 in Realm::GreenletTask::run(void_) ()
[2] #21 0x0000000000a50100 in greenlet::run(void) ()
[2] #22 0x0000000000a4f85c in greenlet_start(void) ()
[2] #23 0x0000000000000000 in ?? ()
[2] Thread 5 (Thread 0x2b35cc805700 (LWP 98089)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[2] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[2] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[2] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[2] #5 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #6 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 4 (Thread 0x2b35cca06700 (LWP 98090)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[2] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[2] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[2] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[2] #5 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #6 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 3 (Thread 0x2b35ccc07700 (LWP 98091)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[2] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[2] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[2] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[2] #5 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #6 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 2 (Thread 0x2b35cce08700 (LWP 98092)):
[2] #0 0x00002b35a28db5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[2] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[2] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[2] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[2] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[2] #5 0x00002b35a28d79d1 in start_thread () from /lib64/libpthread.so.0
[2] #6 0x00002b35a4f548fd in clone () from /lib64/libc.so.6
[2] Thread 1 (Thread 0x2b35a993b1e0 (LWP 98078)):
[2] #0 0x00002b35a4f18a3d in nanosleep () from /lib64/libc.so.6
[2] #1 0x00002b35a4f188b0 in sleep () from /lib64/libc.so.6
[2] #2 0x0000000000a412c7 in Realm::RuntimeImpl::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[2] #3 0x0000000000a3e00c in Realm::Runtime::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[2] #4 0x0000000000c7be7f in LegionRuntime::HighLevel::Runtime::start(int, char**, bool) ()
[2] #5 0x0000000000a623cf in LegionRuntime::HighLevel::HighLevelRuntime::start(int, char**, bool) ()
[2] #6 0x000000000097b33b in main ()
[1] /usr/bin/gstack 71570
[1] Thread 14 (Thread 0x2ac20e682700 (LWP 71571)):
[1] #0 0x00002ac209863373 in select () from /lib64/libc.so.6
[1] #1 0x00002ac2078f765f in service_thread_start () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[1] #2 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #3 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 13 (Thread 0x2ac20eeb7700 (LWP 71572)):
[1] #0 0x00002ac2098610d3 in poll () from /lib64/libc.so.6
[1] #1 0x00002ac2078f5ef0 in btl_openib_async_thread () from /usr/projects/hpcsoft/toss2/mapache/openmpi/1.6.5-gcc-4.4/lib/libmpi.so.1
[1] #2 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #3 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 12 (Thread 0x2ac21010b700 (LWP 71574)):
[1] #0 0x00002ac2071f2380 in pthread_spin_lock () from /lib64/libpthread.so.0
[1] #1 0x00002ac20e6a517f in ?? () from /usr/lib64/libipathverbs-rdmav2.so
[1] #2 0x0000000000d24a41 in gasnetc_rcv_reap ()
[1] #3 0x0000000000d25209 in gasnetc_AMPoll ()
[1] #4 0x00000000009b0c34 in do_some_polling() ()
[1] #5 0x00000000009b0d2c in gasnet_poll_thread_loop(void_) ()
[1] #6 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #7 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 11 (Thread 0x2ac21030c700 (LWP 71575)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x00000000009afb74 in IncomingMessageManager::get_messages(int&, bool) ()
[1] #2 0x00000000009afcfd in Realm::HandlerThread::thread_main() ()
[1] #3 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[1] #4 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #5 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 10 (Thread 0x2ac230510700 (LWP 71576)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[1] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[1] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[1] #4 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #5 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 9 (Thread 0x2ac230711700 (LWP 71577)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[1] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[1] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[1] #4 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #5 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 8 (Thread 0x2ac230912700 (LWP 71578)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[1] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[1] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[1] #4 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #5 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 7 (Thread 0x2ac230b13700 (LWP 71579)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x000000000098f75c in LegionRuntime::LowLevel::XferDesQueue::dequeue_xferDes(LegionRuntime::LowLevel::DMAThread_, bool) ()
[1] #2 0x000000000098c5f9 in LegionRuntime::LowLevel::DMAThread::dma_thread_loop() ()
[1] #3 0x00000000009c7792 in LegionRuntime::LowLevel::DMAThread::start(void_) ()
[1] #4 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #5 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 6 (Thread 0x2ac230d14700 (LWP 71580)):
[1] #0 0x00002ac209851287 in sched_yield () from /lib64/libc.so.6
[1] #1 0x0000000000d3afbe in gasneti_bt_gstack ()
[1] #2 0x0000000000d3c0ff in gasneti_print_backtrace ()
[1] #3 0x0000000000d9fb4e in gasneti_defaultSignalHandler ()
[1] #4
[1] #5 0x00002ac2097b4625 in raise () from /lib64/libc.so.6
[1] #6 0x00002ac2097b5e05 in abort () from /lib64/libc.so.6
[1] #7 0x00002ac2097ad74e in assert_fail_base () from /lib64/libc.so.6
[1] #8 0x00002ac2097ad810 in assert_fail () from /lib64/libc.so.6
[1] #9 0x00000000009c0383 in LegionRuntime::LowLevel::find_shortest_path(Realm::Memory, Realm::Memory, std::vector<Realm::Memory, std::allocatorRealm::Memory >&) ()
[1] #10 0x00000000009cfc39 in void LegionRuntime::LowLevel::CopyRequest::perform_new_dma<3u>(Realm::Memory, Realm::Memory) ()
[1] #11 0x00000000009c0619 in LegionRuntime::LowLevel::CopyRequest::perform_dma() ()
[1] #12 0x00000000009bec95 in LegionRuntime::LowLevel::CopyRequest::check_readiness(bool, LegionRuntime::LowLevel::DmaRequestQueue
) ()
[1] #13 0x00000000009c5251 in Realm::Domain::copy(std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::ProfilingRequestSet const&, Realm::Event, int, bool) const ()
[1] #14 0x00000000009c4acd in Realm::Domain::copy(std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::Event, int, bool) const ()
[1] #15 0x0000000000b46163 in LegionRuntime::HighLevel::RegionTreeForest::issue_copy(Realm::Domain const&, LegionRuntime::HighLevel::Operation
, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, std::vector<Realm::Domain::CopySrcDstField, std::allocatorRealm::Domain::CopySrcDstField > const&, Realm::Event) ()
[1] #16 0x0000000000b3fdfc in LegionRuntime::HighLevel::RegionTreeForest::copy_across(LegionRuntime::HighLevel::Operation
, LegionRuntime::HighLevel::RegionTreeContext, LegionRuntime::HighLevel::RegionTreeContext, LegionRuntime::HighLevel::RegionRequirement const&, LegionRuntime::HighLevel::RegionRequirement const&, LegionRuntime::HighLevel::InstanceRef const&, LegionRuntime::HighLevel::InstanceRef const&, Realm::Event) ()
[1] #17 0x0000000000a84b4f in LegionRuntime::HighLevel::CopyOp::trigger_execution() ()
[1] #18 0x0000000000c7e448 in LegionRuntime::HighLevel::Runtime::high_level_runtime_task(void const
, unsigned long, Realm::Processor) ()
[1] #19 0x0000000000a09fe8 in Realm::PreemptableThread::run_task(Realm::Task_, Realm::Processor) ()
[1] #20 0x0000000000a0a420 in Realm::GreenletTask::run(void_) ()
[1] #21 0x0000000000a50100 in greenlet::run(void) ()
[1] #22 0x0000000000a4f85c in greenlet_start(void) ()
[1] #23 0x0000000000000000 in ?? ()
[1] Thread 5 (Thread 0x2ac230f15700 (LWP 71581)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[1] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[1] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[1] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[1] #5 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #6 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 4 (Thread 0x2ac231300700 (LWP 71582)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[1] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[1] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[1] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[1] #5 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #6 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 3 (Thread 0x2ac231501700 (LWP 71583)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[1] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[1] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[1] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[1] #5 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #6 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 2 (Thread 0x2ac231702700 (LWP 71584)):
[1] #0 0x00002ac2071f15bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
[1] #1 0x00000000009c6bca in GASNetCondVar::wait() ()
[1] #2 0x0000000000a0b257 in Realm::GreenletProcessor::execute_task() ()
[1] #3 0x0000000000a0a539 in Realm::GreenletThread::thread_main() ()
[1] #4 0x0000000000a0a1a9 in Realm::PreemptableThread::thread_entry(void_) ()
[1] #5 0x00002ac2071ed9d1 in start_thread () from /lib64/libpthread.so.0
[1] #6 0x00002ac20986a8fd in clone () from /lib64/libc.so.6
[1] Thread 1 (Thread 0x2ac20e2511e0 (LWP 71570)):
[1] #0 0x00002ac20982ea3d in nanosleep () from /lib64/libc.so.6
[1] #1 0x00002ac20982e8b0 in sleep () from /lib64/libc.so.6
[1] #2 0x0000000000a412c7 in Realm::RuntimeImpl::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[1] #3 0x0000000000a3e00c in Realm::Runtime::run(unsigned int, Realm::Runtime::RunStyle, void const_, unsigned long, bool) ()
[1] #4 0x0000000000c7be7f in LegionRuntime::HighLevel::Runtime::start(int, char**, bool) ()
[1] #5 0x0000000000a623cf in LegionRuntime::HighLevel::HighLevelRuntime::start(int, char**, bool) ()

[1] #6 0x000000000097b33b in main ()

mpirun noticed that process rank 1 with PID 71570 on node ml137 exited on signal 6 (Aborted).

*** Caught a signal: SIGTERM(15) on node 3/4

dma: wrong version of HDFMemory::create_instance called

I am seeing the assertion in the default version HDFMemory::create_instance fail.

tester_io: /mnt/nfs/legion/runtime/lowlevel_disk.cc:245: virtual Realm::RegionInstance Realm::HDFMemory::create_instance(Realm::IndexSpace, const int*, size_t, size_t, size_t, const std::vector<long unsigned int>&, Realm::ReductionOpID, off_t, const Realm::ProfilingRequestSet&, Realm::RegionInstance): Assertion `0' failed.

This is a backtrace with line numbers.

 2: (Realm::HDFMemory::create_instance(Realm::IndexSpace, int const*, unsigned long, unsigned long, unsigned long, std::vector<unsigned long, std::allocator<unsigned long> > const&, int, long, Realm::ProfilingRequestSet const&, Realm::RegionInstance)+0x49) [0x
98f2f7]
/mnt/nfs/legion/runtime/lowlevel_disk.cc:244
 3: (Realm::Domain::create_instance(Realm::Memory, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, Realm::ProfilingRequestSet const&, int) const+0xa82) [0xa2499e]
/mnt/nfs/legion/runtime/realm/idx_impl.cc:466
 4: (Realm::Domain::create_instance(Realm::Memory, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, int) const+0x4d) [0xa23ee9]
/mnt/nfs/legion/runtime/realm/idx_impl.cc:330
 5: (Realm::Domain::create_instance(Realm::Memory, unsigned long, int) const+0x97) [0xa23d37]
/mnt/nfs/legion/runtime/realm/idx_impl.cc:309 (discriminator 1)
 6: (LegionRuntime::HighLevel::RegionTreeForest::create_instance(Realm::Domain const&, Realm::Memory, unsigned long, LegionRuntime::HighLevel::Operation*)+0x9b) [0xb309f5]
/mnt/nfs/legion/runtime/region_tree.cc:4030
 7: (LegionRuntime::HighLevel::FieldSpaceNode::create_instance(Realm::Memory, Realm::Domain, std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&, unsigned long, unsigned int, LegionRuntime::HighLevel::RegionNode*, LegionRuntim
e::HighLevel::Operation*)+0x245) [0xb45145]
/mnt/nfs/legion/runtime/region_tree.cc:9001
 8: (LegionRuntime::HighLevel::RegionNode::create_instance(Realm::Memory, std::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&, unsigned long, unsigned int, LegionRuntime::HighLevel::Operation*)+0x78) [0xb5d77a]
/mnt/nfs/legion/runtime/region_tree.cc:16872
 9: (LegionRuntime::HighLevel::MappingTraverser<false>::map_physical_region(LegionRuntime::HighLevel::RegionNode*)+0x101e) [0xc245ba]
/mnt/nfs/legion/runtime/region_tree.cc:10942 (discriminator 1)
 10: (LegionRuntime::HighLevel::MappingTraverser<false>::visit_region(LegionRuntime::HighLevel::RegionNode*)+0x57) [0xc22995]
/mnt/nfs/legion/runtime/region_tree.cc:10663
 11: (LegionRuntime::HighLevel::RegionNode::visit_node(LegionRuntime::HighLevel::PathTraverser*)+0x2e) [0xb5ccfc]
/mnt/nfs/legion/runtime/region_tree.cc:16653
 12: (LegionRuntime::HighLevel::PathTraverser::traverse(LegionRuntime::HighLevel::RegionTreeNode*)+0xc4) [0xb47ce4]
/mnt/nfs/legion/runtime/region_tree.cc:9838
 13: (LegionRuntime::HighLevel::RegionTreeForest::map_physical_region(LegionRuntime::HighLevel::RegionTreeContext, LegionRuntime::HighLevel::RegionTreePath&, LegionRuntime::HighLevel::RegionRequirement&, unsigned int, LegionRuntime::HighLevel::Operation*, Realm::Processor, Realm::Processor, char const*, unsigned long long)+0x2bf) [0xb27e21]
/mnt/nfs/legion/runtime/region_tree.cc:1788 (discriminator 1)
 14: (LegionRuntime::HighLevel::SingleTask::map_all_regions(Realm::Processor, Realm::Event, bool)+0x6a9) [0xac0213]
/mnt/nfs/legion/runtime/legion_tasks.cc:4201 (discriminator 2)
 15: (LegionRuntime::HighLevel::PointTask::perform_mapping(bool)+0x36) [0xaca010]
/mnt/nfs/legion/runtime/legion_tasks.cc:7002
 16: (LegionRuntime::HighLevel::SliceTask::map_and_launch()+0x1d0) [0xad1cd2]
/mnt/nfs/legion/runtime/legion_tasks.cc:9440
 17: (LegionRuntime::HighLevel::MultiTask::trigger_execution()+0x26c) [0xac4cac]
/mnt/nfs/legion/runtime/legion_tasks.cc:5495
 18: (LegionRuntime::HighLevel::DeferredSlicer::perform_slice(LegionRuntime::HighLevel::SliceTask*)+0x2a) [0xad4b04]
/mnt/nfs/legion/runtime/legion_tasks.cc:10239
 19: (LegionRuntime::HighLevel::DeferredSlicer::handle_slice(void const*)+0x2f) [0xad4ba3]
/mnt/nfs/legion/runtime/legion_tasks.cc:10252
 20: (LegionRuntime::HighLevel::Runtime::high_level_runtime_task(void const*, unsigned long, Realm::Processor)+0x47e) [0xc55920]
/mnt/nfs/legion/runtime/runtime.cc:16728
 21: (Realm::PreemptableThread::run_task(Realm::Task*, Realm::Processor)+0xb8) [0xa0276c]
/mnt/nfs/legion/runtime/realm/proc_impl.cc:993 (discriminator 3)
 22: (Realm::GreenletTask::run(void*)+0x3a) [0xa02b8a]
/mnt/nfs/legion/runtime/realm/proc_impl.cc:1077
 23: (greenlet::_run(void*)+0x3c) [0xa46406]
/mnt/nfs/legion/runtime/greenlet/greenlet-cc.cc:158
 24: ./tester_io() [0xa45ba0]
/mnt/nfs/legion/runtime/greenlet/greenlet.cc:122
tester_io: /mnt/nfs/legion/runtime/lowlevel_disk.cc:245: virtual Realm::RegionInstance Realm::HDFMemory::create_instance(Realm::IndexSpace, const int*, size_t, size_t, size_t, const std::vector<long unsigned int>&, Realm::ReductionOpID, off_t, const Realm::Pro
filingRequestSet&, Realm::RegionInstance): Assertion `0' failed.
*** Caught a fatal signal: SIGABRT(6) on node 0/3
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_JA9TVT '/mnt/nfs/legion/test/hdf_attach_subregion_parallel/./tester_io' 3017
[0] No threads.

Realm Threading Assertion

Hit this assert in the Realm threading code when running S3D in the versioning branch:

tasks.cc:932: virtual void Realm::UserThreadTaskScheduler::worker_wake(Realm::Thread*): Assertion `0' failed.

It reproduces every time. There is also the following comment in the code: "in a user-threading environment, can't just wake a thread up out of nowhere". Here is a backtrace:
#7 0x00007f4c806a1b22 in __GI___assert_fail (assertion=0x7f4c824664f7 "0",

file=0x7f4c82466200 "/home/mebauer/extern_legion/legion/runtime//realm/tasks.cc", line=932, 
function=0x7f4c82466ec0 <Realm::UserThreadTaskScheduler::worker_wake(Realm::Thread*)::__PRETTY_FUNCTION__> "virtual void Realm::UserThreadTaskScheduler::worker_wake(Realm::Thread*)") at assert.c:101

#8 0x00007f4c81fc1d55 in Realm::UserThreadTaskScheduler::worker_wake (this=0x37381a0,

to_wake=0x7f47fc3052e0) at /home/mebauer/extern_legion/legion/runtime//realm/tasks.cc:932

#9 0x00007f4c81fc04d3 in Realm::ThreadedTaskScheduler::thread_ready (this=0x37381a0,

thread=0x7f47fc3052e0) at /home/mebauer/extern_legion/legion/runtime//realm/tasks.cc:370

#10 0x00007f4c81fdf9b3 in Realm::ThreadWakerRealm::EventTriggeredCondition::operator() (

this=0x7f47fe6a6710) at /home/mebauer/extern_legion/legion/runtime//realm/threads.inl:186

#11 0x00007f4c81fd3aab in Realm::EventTriggeredCondition::Callback::event_triggered (

this=0x7f47fe6a6710) at /home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc:100

#12 0x00007f4c81fd55e4 in Realm::GenEventImpl::trigger (this=0x7f481006e4b0, gen_triggered=6,

trigger_node=0, wait_on=...)
at /home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc:888

#13 0x00007f4c81fd52ca in Realm::GenEventImpl::trigger_current (this=0x7f481006e4b0)

at /home/mebauer/extern_legion/legion/runtime//realm/event_impl.cc:804

#14 0x00007f4c81fe1aff in Realm::ReservationImpl::release (this=0x7f47fc2051a8)

at /home/mebauer/extern_legion/legion/runtime//realm/rsrv_impl.cc:734

#15 0x00007f4c81fdfeaa in Realm::Reservation::release (this=0x7f480d202ca8, wait_on=...)

at /home/mebauer/extern_legion/legion/runtime//realm/rsrv_impl.cc:149

#16 0x00007f4c820616c7 in LegionRuntime::HighLevel::AutoLock::~AutoLock (this=0x7f480d202ca0,

__in_chrg=<optimized out>) at /home/mebauer/extern_legion/legion/runtime//legion_utilities.h:237

#17 0x00007f4c8225151b in LegionRuntime::HighLevel::Runtime::free_distributed_id (

this=0x7f47fc201c20, did=18) at /home/mebauer/extern_legion/legion/runtime//runtime.cc:12809

#18 0x00007f4c822594c6 in LegionRuntime::HighLevel::Runtime::high_level_runtime_task (

args=0x7f481007d740, arglen=12, p=...)
at /home/mebauer/extern_legion/legion/runtime//runtime.cc:15880

#19 0x00007f4c81fbfa37 in Realm::Task::execute_on_processor (this=0x7f48100be3c0, p=...)

at /home/mebauer/extern_legion/legion/runtime//realm/tasks.cc:80

#20 0x00007f4c81fc1c50 in Realm::UserThreadTaskScheduler::execute_task (this=0x37395c0,

task=0x7f48100be3c0) at /home/mebauer/extern_legion/legion/runtime//realm/tasks.cc:884

#21 0x00007f4c81fc07bd in Realm::ThreadedTaskScheduler::scheduler_loop (this=0x37395c0)

at /home/mebauer/extern_legion/legion/runtime//realm/tasks.cc:448

#22 0x00007f4c81fc5764 in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop> (obj=0x37395c0)

at /home/mebauer/extern_legion/legion/runtime//realm/threads.inl:127

#23 0x00007f4c81fad704 in Realm::UserThread::uthread_entry ()

at /home/mebauer/extern_legion/legion/runtime//realm/threads.cc:740

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.