Code Monkey home page Code Monkey logo

Comments (8)

devinamatthews avatar devinamatthews commented on July 16, 2024

Since Github is retarded and can't attach text files, here is the test program:

#include "mpi.h"
#include "ctf.hpp"

using namespace CTF;

int main(int argc, char **argv)
{
    MPI_Init(&argc, &argv);
    {
        int no = 8;
        int nv = 31;

        int len_A[6] = {nv, no, no, no, no, no};
        int sym_A[6] = {NS, NS, NS, NS, AS, NS}; 
        Tensor<> A(6, len_A, sym_A);

        int len_B[6] = {nv, nv, nv, no, no, no};
        int sym_B[6] = {NS, AS, NS, NS, AS, NS}; 
        Tensor<> B(6, len_B, sym_B);

        int len_C[8] = {nv, nv, nv, nv, no, no, no, no};
        int sym_C[8] = {NS, AS, AS, NS, NS, AS, AS, NS}; 
        Tensor<> C(8, len_C, sym_C);

        C["abcdijkl"] = A["bmnijk"]*B["acdmnl"];
    }
    MPI_Finalize();

    return 0;
}

The output is:

[c101-107:23214] *** Process received signal ***
[c101-107:23214] Signal: Segmentation fault (11)
[c101-107:23214] Signal code: Address not mapped (1)
[c101-107:23214] Failing at address: (nil)
[c101-107:23214] [ 0] /lib64/libpthread.so.0() [0x3e4180f710]
[c101-107:23214] [ 1] test.x(_ZN7CTF_int7handlerEv+0x15e) [0x4a7226]
[c101-107:23214] [ 2] test.x(_ZN7CTF_int16ctr_2d_gen_buildEiNS_8CommDataEiPiRiS2_PNS_6tensorEiRPS0_RlS7_RbS1_S7_PKiS2_S4_iS6_S7_S7_S8_S1_S7_SA_S2_S4_iS6_S7_S7_S8_S1_S7_SA_S2_+0x978) [0x5101d4]
[c101-107:23214] [ 3] test.x(_ZN7CTF_int11contraction19construct_dense_ctrEiPKNS_6iparamEPiiPKi+0xfe3) [0x5075d1]
[c101-107:23214] [ 4] test.x(_ZN7CTF_int11contraction13construct_ctrEiPKNS_6iparamEPii+0x34e) [0x509812]
[c101-107:23214] [ 5] test.x(_ZN7CTF_int11contraction16get_best_exh_mapEPKNS_12distributionES3_S3_PNS_8topologyES5_S5_PKNS_7mappingES8_S8_RiRdd+0xb70) [0x504cb0]
[c101-107:23214] [ 6] test.x(_ZN7CTF_int11contraction3mapEPPNS_3ctrEb+0x7da) [0x50593a]
[c101-107:23214] [ 7] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xac9) [0x50ba5b]
[c101-107:23214] [ 8] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xbc5) [0x50bb57]
[c101-107:23214] [ 9] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xbc5) [0x50bb57]
[c101-107:23214] [10] test.x(_ZN7CTF_int11contraction13home_contractEv+0x66a) [0x50ca24]
[c101-107:23214] [11] test.x(_ZN7CTF_int11contraction7executeEv+0x2e) [0x4f9bc0]
[c101-107:23214] [12] test.x(_ZNK7CTF_int13Contract_Term7executeEN3CTF10Idx_TensorE+0x735) [0x4e4367]
[c101-107:23214] [13] test.x(_ZN3CTF10Idx_TensoraSERKN7CTF_int4TermE+0x126) [0x4a98b2]
[c101-107:23214] [14] test.x(main+0x3ab) [0x49882b]
[c101-107:23214] [15] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3e4141ed1d]
[c101-107:23214] [16] test.x() [0x494729]
[c101-107:23214] *** End of error message ***
[c101-102.ices.utexas.edu][[21830,1],9][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[c101-111.ices.utexas.edu][[21830,1],44][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
srun: error: c101-107: task 36: Segmentation fault (core dumped)
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: got SIGCONT
slurmd[c101-101]: *** STEP 152902.0 CANCELLED AT 2016-01-07T16:09:47 ***
slurmd[c101-101]: *** JOB 152902 CANCELLED AT 2016-01-07T16:09:47 ***
srun: forcing job termination

from ctf.

devinamatthews avatar devinamatthews commented on July 16, 2024

Also, this snippet is extracted from an Aquarius job (which fails after ~24 hrs :(). A similar job (different no/nv) completes successfully, while another does not trigger this assert but produces bogus results.

from ctf.

solomonik avatar solomonik commented on July 16, 2024

OK will debug this tomorrow morning.
On Jan 7, 2016 23:22, "Devin Matthews" [email protected] wrote:

Also, this snippet is extracted from an Aquarius job (which fails after
~24 hrs :(). A similar job (different no/nv) completes successfully, while
another does not trigger this assert but produces bogus results.


Reply to this email directly or view it on GitHub
https://github.com/solomonik/ctf/issues/11#issuecomment-169825104.

from ctf.

jeffhammond avatar jeffhammond commented on July 16, 2024

@devinamatthews https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/ appears at the top of the page if you google for "github issues attach files" ;-)

from ctf.

devinamatthews avatar devinamatthews commented on July 16, 2024

Right, but I'm not gonna put a .txt on a .cxx file, it's just wrong. It's a programming site for Christ's sake.

On January 7, 2016 4:46:06 PM CST, Jeff Hammond [email protected] wrote:

@devinamatthews
https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/
appears at the top of the page if you google for "github issues attach
files" ;-)


Reply to this email directly or view it on GitHub:
https://github.com/solomonik/ctf/issues/11#issuecomment-169831264

from ctf.

solomonik avatar solomonik commented on July 16, 2024

I think I fixed it on branch dbl_phys_map_bug. I reproduced the trace on
my desktop by disabling actual data allocation, since it happens during
construction of a contraction algorithm. It happens when there is a nested
physical mapping that corresponds to a mismatch of physical dimensions,
which means data is communicated in some level of SUMMA along this
dimensional mapping. The SUMMA algorithm constructions complains about it
with an assert. I made such mappings invalid. I will think tomorrow about
whether they actually make sense and whether what I did can have
performance consequences, after which I will merge something into master.
But I think whats on the branch should work (although I never actually
fully executed your test case). Sorry to hear a 24 hr run was wasted.

On Thu, Jan 7, 2016 at 11:49 PM, Devin Matthews [email protected]
wrote:

Right, but I'm not gonna put a .txt on a .cxx file, it's just wrong. It's
a programming site for Christ's sake.

On January 7, 2016 4:46:06 PM CST, Jeff Hammond [email protected]
wrote:

@devinamatthews

https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/
appears at the top of the page if you google for "github issues attach
files" ;-)


Reply to this email directly or view it on GitHub:
https://github.com/solomonik/ctf/issues/11#issuecomment-169831264


Reply to this email directly or view it on GitHub
https://github.com/solomonik/ctf/issues/11#issuecomment-169832051.

from ctf.

jeffhammond avatar jeffhammond commented on July 16, 2024

@devinamatthews gz is the suffix of most relevance...

from ctf.

solomonik avatar solomonik commented on July 16, 2024

I think the fix I implemented last night was all correct. Devin's test executes without error with 48 processes on Edison now. I merged it into master. Now benchmarking performance on more nodes, but am so happy that -ipo works on Edison now that did not even wait for all the results.

from ctf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.