Code Monkey home page Code Monkey logo

Comments (7)

streichler avatar streichler commented on July 17, 2024

Zhihao, can you provide the exact command line you're using?

from legion.

jiazhihao avatar jiazhihao commented on July 17, 2024

I think the failure could be reproduced by following command line:

GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2

in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.

Sincerely,
Zhihao

On Thu, Sep 3, 2015 at 1:11 PM, streichler [email protected] wrote:

Zhihao, can you provide the exact command line you're using?


Reply to this email directly or view it on GitHub
#50 (comment)
.

from legion.

jiazhihao avatar jiazhihao commented on July 17, 2024

My bad...please ignore the previous email, which leads to a fluid3d failure
case. I found the active message failure case when I run dma_random test in
dma branch. I am not sure if we can reproduce it in master branch.

On Thursday, September 3, 2015, Zhihao Jia [email protected] wrote:

I think the failure could be reproduced by following command line:

GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2

in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.

Sincerely,
Zhihao

On Thu, Sep 3, 2015 at 1:11 PM, streichler <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

Zhihao, can you provide the exact command line you're using?


Reply to this email directly or view it on GitHub
#50 (comment)
.

Sincerely,
Zhihao

from legion.

jiazhihao avatar jiazhihao commented on July 17, 2024

The following steps could help reproduce #50.

Step 1: check out to recent dma branch
Step 2: enter test/dma_random folder
Step 3: GASNET_BACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001
dma_random -ll:rsize 1024 -ll:dsize 1024 -ll:ahandlers 2

Sincerely,
Zhihao

On Thu, Sep 3, 2015 at 1:31 PM, Zhihao Jia [email protected] wrote:

My bad...please ignore the previous email, which leads to a fluid3d
failure case. I found the active message failure case when I run dma_random
test in dma branch. I am not sure if we can reproduce it in master branch.

On Thursday, September 3, 2015, Zhihao Jia [email protected] wrote:

I think the failure could be reproduced by following command line:

GASNETBACKTRACE=1 mpirun -n 2 --bind-to none -H n0000,n0001 ./fluid3d
-input in_300K.fluid -nbx 2

in which case, we try to partition the entire computation into 2 pieces,
and run each piece on one node.

Sincerely,
Zhihao

On Thu, Sep 3, 2015 at 1:11 PM, streichler [email protected]
wrote:

Zhihao, can you provide the exact command line you're using?


Reply to this email directly or view it on GitHub
#50 (comment)
.

Sincerely,
Zhihao

from legion.

streichler avatar streichler commented on July 17, 2024

Zhihao, when I run the above command now, I reliably get this error instead:
dma_random: ../../runtime/realm/mem_impl.cc:604: virtual void Realm::RemoteMemory::get_bytes(off_t, void*, size_t): Assertion kind == MemoryImpl::MKIND_RDMA' failed.
*** Caught a fatal signal: SIGABRT(6) on node 0/2`

Can you see if this is a new issue hiding the old one, or maybe the old one is fixed and this one
was hiding behind it?

from legion.

jiazhihao avatar jiazhihao commented on July 17, 2024

Hi Sean,

The bug you encountered is a issue happening before #50, which is caused by
check_correctness task trying to read a remote instance (not RDMA-able). I
have pushed a fix into dma branch. You should be able to reproduce #50 on
n0000,n0001. By the way, don't forget to remove disk_file* before every
execution, otherwise, disk memory sanity check may fail.

I am working on merging your threading changes into dma branch. Do you
think if the threading changes may fix #50?

Sincerely,
Zhihao

On Wed, Sep 9, 2015 at 9:23 AM, streichler [email protected] wrote:

Zhihao, when I run the above command now, I reliably get this error
instead:
dma_random: ../../runtime/realm/mem_impl.cc:604: virtual void
Realm::RemoteMemory::get_bytes(off_t, void_, size_t): Assertion `kind ==
MemoryImpl::MKIND_RDMA' failed.
*_* Caught a fatal signal: SIGABRT(6) on node 0/2

Can you see if this is a new issue hiding the old one, or maybe the old
one is fixed and this one
was hiding behind it?


Reply to this email directly or view it on GitHub
#50 (comment)
.

from legion.

streichler avatar streichler commented on July 17, 2024

This has been replaced by #61, which appears to be an issue inside the new DMA code.

from legion.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.