Comments (8)
Since Github is retarded and can't attach text files, here is the test program:
#include "mpi.h"
#include "ctf.hpp"
using namespace CTF;
int main(int argc, char **argv)
{
MPI_Init(&argc, &argv);
{
int no = 8;
int nv = 31;
int len_A[6] = {nv, no, no, no, no, no};
int sym_A[6] = {NS, NS, NS, NS, AS, NS};
Tensor<> A(6, len_A, sym_A);
int len_B[6] = {nv, nv, nv, no, no, no};
int sym_B[6] = {NS, AS, NS, NS, AS, NS};
Tensor<> B(6, len_B, sym_B);
int len_C[8] = {nv, nv, nv, nv, no, no, no, no};
int sym_C[8] = {NS, AS, AS, NS, NS, AS, AS, NS};
Tensor<> C(8, len_C, sym_C);
C["abcdijkl"] = A["bmnijk"]*B["acdmnl"];
}
MPI_Finalize();
return 0;
}
The output is:
[c101-107:23214] *** Process received signal ***
[c101-107:23214] Signal: Segmentation fault (11)
[c101-107:23214] Signal code: Address not mapped (1)
[c101-107:23214] Failing at address: (nil)
[c101-107:23214] [ 0] /lib64/libpthread.so.0() [0x3e4180f710]
[c101-107:23214] [ 1] test.x(_ZN7CTF_int7handlerEv+0x15e) [0x4a7226]
[c101-107:23214] [ 2] test.x(_ZN7CTF_int16ctr_2d_gen_buildEiNS_8CommDataEiPiRiS2_PNS_6tensorEiRPS0_RlS7_RbS1_S7_PKiS2_S4_iS6_S7_S7_S8_S1_S7_SA_S2_S4_iS6_S7_S7_S8_S1_S7_SA_S2_+0x978) [0x5101d4]
[c101-107:23214] [ 3] test.x(_ZN7CTF_int11contraction19construct_dense_ctrEiPKNS_6iparamEPiiPKi+0xfe3) [0x5075d1]
[c101-107:23214] [ 4] test.x(_ZN7CTF_int11contraction13construct_ctrEiPKNS_6iparamEPii+0x34e) [0x509812]
[c101-107:23214] [ 5] test.x(_ZN7CTF_int11contraction16get_best_exh_mapEPKNS_12distributionES3_S3_PNS_8topologyES5_S5_PKNS_7mappingES8_S8_RiRdd+0xb70) [0x504cb0]
[c101-107:23214] [ 6] test.x(_ZN7CTF_int11contraction3mapEPPNS_3ctrEb+0x7da) [0x50593a]
[c101-107:23214] [ 7] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xac9) [0x50ba5b]
[c101-107:23214] [ 8] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xbc5) [0x50bb57]
[c101-107:23214] [ 9] test.x(_ZN7CTF_int11contraction12sym_contractEv+0xbc5) [0x50bb57]
[c101-107:23214] [10] test.x(_ZN7CTF_int11contraction13home_contractEv+0x66a) [0x50ca24]
[c101-107:23214] [11] test.x(_ZN7CTF_int11contraction7executeEv+0x2e) [0x4f9bc0]
[c101-107:23214] [12] test.x(_ZNK7CTF_int13Contract_Term7executeEN3CTF10Idx_TensorE+0x735) [0x4e4367]
[c101-107:23214] [13] test.x(_ZN3CTF10Idx_TensoraSERKN7CTF_int4TermE+0x126) [0x4a98b2]
[c101-107:23214] [14] test.x(main+0x3ab) [0x49882b]
[c101-107:23214] [15] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3e4141ed1d]
[c101-107:23214] [16] test.x() [0x494729]
[c101-107:23214] *** End of error message ***
[c101-102.ices.utexas.edu][[21830,1],9][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[c101-111.ices.utexas.edu][[21830,1],44][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
srun: error: c101-107: task 36: Segmentation fault (core dumped)
srun: Job step aborted: Waiting up to 2 seconds for job step to finish.
srun: got SIGCONT
slurmd[c101-101]: *** STEP 152902.0 CANCELLED AT 2016-01-07T16:09:47 ***
slurmd[c101-101]: *** JOB 152902 CANCELLED AT 2016-01-07T16:09:47 ***
srun: forcing job termination
from ctf.
Also, this snippet is extracted from an Aquarius job (which fails after ~24 hrs :(). A similar job (different no/nv) completes successfully, while another does not trigger this assert but produces bogus results.
from ctf.
OK will debug this tomorrow morning.
On Jan 7, 2016 23:22, "Devin Matthews" [email protected] wrote:
Also, this snippet is extracted from an Aquarius job (which fails after
~24 hrs :(). A similar job (different no/nv) completes successfully, while
another does not trigger this assert but produces bogus results.—
Reply to this email directly or view it on GitHub
https://github.com/solomonik/ctf/issues/11#issuecomment-169825104.
from ctf.
@devinamatthews https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/ appears at the top of the page if you google for "github issues attach files" ;-)
from ctf.
Right, but I'm not gonna put a .txt on a .cxx file, it's just wrong. It's a programming site for Christ's sake.
On January 7, 2016 4:46:06 PM CST, Jeff Hammond [email protected] wrote:
@devinamatthews
https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/
appears at the top of the page if you google for "github issues attach
files" ;-)
Reply to this email directly or view it on GitHub:
https://github.com/solomonik/ctf/issues/11#issuecomment-169831264
from ctf.
I think I fixed it on branch dbl_phys_map_bug. I reproduced the trace on
my desktop by disabling actual data allocation, since it happens during
construction of a contraction algorithm. It happens when there is a nested
physical mapping that corresponds to a mismatch of physical dimensions,
which means data is communicated in some level of SUMMA along this
dimensional mapping. The SUMMA algorithm constructions complains about it
with an assert. I made such mappings invalid. I will think tomorrow about
whether they actually make sense and whether what I did can have
performance consequences, after which I will merge something into master.
But I think whats on the branch should work (although I never actually
fully executed your test case). Sorry to hear a 24 hr run was wasted.
On Thu, Jan 7, 2016 at 11:49 PM, Devin Matthews [email protected]
wrote:
Right, but I'm not gonna put a .txt on a .cxx file, it's just wrong. It's
a programming site for Christ's sake.On January 7, 2016 4:46:06 PM CST, Jeff Hammond [email protected]
wrote:https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/
appears at the top of the page if you google for "github issues attach
files" ;-)
Reply to this email directly or view it on GitHub:
https://github.com/solomonik/ctf/issues/11#issuecomment-169831264—
Reply to this email directly or view it on GitHub
https://github.com/solomonik/ctf/issues/11#issuecomment-169832051.
from ctf.
@devinamatthews gz
is the suffix of most relevance...
from ctf.
I think the fix I implemented last night was all correct. Devin's test executes without error with 48 processes on Edison now. I merged it into master. Now benchmarking performance on more nodes, but am so happy that -ipo works on Edison now that did not even wait for all the results.
from ctf.
Related Issues (20)
- Efficiency comparisons with einsum and opt-einsum and how to utilize symmetry? HOT 4
- ctf-einsum.py:78: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. HOT 1
- issues using sparse file io to load tensors HOT 4
- Compile error with GCC 11 HOT 10
- Wrong results when slicing a symmetric sparse tensor in python lib
- understanding performance overheads in CTF HOT 27
- ModuleNotFoundError: No module named 'ctf.core' HOT 1
- Comparing Fortran and CTF performance on symmetries in tensor contractions HOT 2
- segfault executing sparse inner product HOT 18
- oom/memory corruption running an SDDMM (using TTTP specialized routine) HOT 1
- unexpected performance for SpMV operation HOT 1
- [question] setting all nonzeros to a value HOT 2
- Warnings
- make test failure in SVD test HOT 3
- compiling issue when including ctf.hpp
- test_suite failure on Apple HOT 1
- compile issues with undefined references to mkl commands (that appear in the relevant folders) HOT 9
- Scale with Endomorphism
- Set values with low memory footprint
- ctf header error during installation on perlmutter HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctf.