Comments (7)
Zhihao, can you provide the exact command line you're using?
from legion.
I think the failure could be reproduced by following command line:
GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2
in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.
Sincerely,
Zhihao
On Thu, Sep 3, 2015 at 1:11 PM, streichler [email protected] wrote:
Zhihao, can you provide the exact command line you're using?
—
Reply to this email directly or view it on GitHub
#50 (comment)
.
from legion.
My bad...please ignore the previous email, which leads to a fluid3d failure
case. I found the active message failure case when I run dma_random test in
dma branch. I am not sure if we can reproduce it in master branch.
On Thursday, September 3, 2015, Zhihao Jia [email protected] wrote:
I think the failure could be reproduced by following command line:
GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.Sincerely,
ZhihaoOn Thu, Sep 3, 2015 at 1:11 PM, streichler <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:Zhihao, can you provide the exact command line you're using?
—
Reply to this email directly or view it on GitHub
#50 (comment)
.
Sincerely,
Zhihao
from legion.
The following steps could help reproduce #50.
Step 1: check out to recent dma branch
Step 2: enter test/dma_random folder
Step 3: GASNET_BACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001
dma_random -ll:rsize 1024 -ll:dsize 1024 -ll:ahandlers 2
Sincerely,
Zhihao
On Thu, Sep 3, 2015 at 1:31 PM, Zhihao Jia [email protected] wrote:
My bad...please ignore the previous email, which leads to a fluid3d
failure case. I found the active message failure case when I run dma_random
test in dma branch. I am not sure if we can reproduce it in master branch.On Thursday, September 3, 2015, Zhihao Jia [email protected] wrote:
I think the failure could be reproduced by following command line:
GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.Sincerely,
ZhihaoOn Thu, Sep 3, 2015 at 1:11 PM, streichler [email protected]
wrote:Zhihao, can you provide the exact command line you're using?
—
Reply to this email directly or view it on GitHub
#50 (comment)
.Sincerely,
Zhihao
from legion.
Zhihao, when I run the above command now, I reliably get this error instead:
dma_random: ../../runtime/realm/mem_impl.cc:604: virtual void Realm::RemoteMemory::get_bytes(off_t, void*, size_t): Assertion
kind == MemoryImpl::MKIND_RDMA' failed.
*** Caught a fatal signal: SIGABRT(6) on node 0/2`
Can you see if this is a new issue hiding the old one, or maybe the old one is fixed and this one
was hiding behind it?
from legion.
Hi Sean,
The bug you encountered is a issue happening before #50, which is caused by
check_correctness task trying to read a remote instance (not RDMA-able). I
have pushed a fix into dma branch. You should be able to reproduce #50 on
n0000,n0001. By the way, don't forget to remove disk_file* before every
execution, otherwise, disk memory sanity check may fail.
I am working on merging your threading changes into dma branch. Do you
think if the threading changes may fix #50?
Sincerely,
Zhihao
On Wed, Sep 9, 2015 at 9:23 AM, streichler [email protected] wrote:
Zhihao, when I run the above command now, I reliably get this error
instead:
dma_random: ../../runtime/realm/mem_impl.cc:604: virtual void
Realm::RemoteMemory::get_bytes(off_t, void_, size_t): Assertion `kind ==
MemoryImpl::MKIND_RDMA' failed.
*_* Caught a fatal signal: SIGABRT(6) on node 0/2Can you see if this is a new issue hiding the old one, or maybe the old
one is fixed and this one
was hiding behind it?—
Reply to this email directly or view it on GitHub
#50 (comment)
.
from legion.
This has been replaced by #61, which appears to be an issue inside the new DMA code.
from legion.
Related Issues (20)
- Deadlock at shutdown when collecting Legion_prof logs HOT 9
- I'd like to ask about multithreading. HOT 6
- legion_prof_rs: channel names getting cut off in archive viewer HOT 1
- Realm: compile time error in `shm.cc` HOT 4
- Legion: AllGatherCollective<false>::~AllGatherCollective() [INORDER = false]: Assertion `done_triggered' failed. HOT 5
- legion_prof_rs: multiple nodes merged in archive viewer HOT 3
- Realm: Signal handler not demangling C++ names on MacOS HOT 2
- Legion: Deserializer segfault HOT 5
- Legion: shardrefine slow startup on Summit HOT 6
- Realm: cuMemcpy3dAsync_v2 crash on Summit HOT 16
- Realm: set_bit id assertion on Summit HOT 3
- Realm: Failed to send message HOT 2
- [Question] Requesting profiling data on "implicit" region copy tasks HOT 10
- A question about Control Replication shards HOT 2
- HTR correctness failure at 1 node with GPUs HOT 16
- About Performance Issues Using Accessors HOT 13
- profiler: improve the error message for the rust profiler when parsing logs HOT 1
- profiler: record the git hash to the log files to avoid errors when using the wrong version of the profiler HOT 6
- How do I partition a 3D index space? HOT 6
- Legion_prof: `assertion failed: copy_inst_info._src.unwrap() == chan_src` HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from legion.