cyclops-community / ctf Goto Github PK

View Code? Open in Web Editor NEW

193.0 193.0 53.0 7.16 MB

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays

License: Other

Makefile 0.74% C++ 86.92% C 0.02% Cuda 0.46% Shell 0.37% Python 3.23% Cython 8.27%

ctf's People

Contributors

Stargazers

Watchers

Forkers

mapleyustat devinamatthews januseriksen justusc alejandrogallo yangjunpro ruiqian1 grseb9s yashasvisharma elsong2 sunqm ct-clmsn hehuannb solomonik yuzhiliu kostrzewa wentaoy2 zhcui ivanredbread itachixc chu604 pecanjk jasonzhang928 kabicm ryanlevy whu-dft yuchenpang zeta1999 zoonono raghavendrak linjianma cocteautwins philliphelms navjo2323 cc4s airmler presciman timjbaer rohany geertiebear jeffhammond huanghua1994 mahmozaffari jthies zhang677 cyclops-community dulsinea ajaypanyala mlouhivu realab-ai huttered40 toschaefer

ctf's Issues

Assert triggered in ctr_2d_gen_build

The attached program hits an assertion in ctr_2d_gen_build (line 129) on 48 processors. Some numerology:

Fails on rank 36 and only rank 36.
Fails only for 48 cores (using OMP_NUM_THREADS=1).
Fails for no/nv = 8/31, but not 6/40 or 4/19.

Backtrace is:

/home1/dmatthews/src/aquarius/src/external/ctf/src/interface/common.cxx:182
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/ctr_2d_general.cxx:129
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:3268
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:3794
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:2697
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:2845
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:4377
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:4400
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:4400
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:4578
/home1/dmatthews/src/aquarius/src/external/ctf/src/contraction/contraction.cxx:108
/home1/dmatthews/src/aquarius/src/external/ctf/src/interface/term.cxx:499
/home1/dmatthews/src/aquarius/src/external/ctf/src/interface/idx_tensor.cxx:194
/home1/dmatthews/src/aquarius/src/external/ctf/include/../src/interface/idx_tensor.h:223

3D Cholesky factorization performance comparison using CTF Python

A while back, we discussed a simple benchmark of CTF Python to compare the performance of PAA's 3D parallel Cholesky factorization with the simple recursive algorithm written in CTF python. Let me know if that is still useful. Might be interesting if one significantly outperforms the other.

hstack and vstack

implement via setitem, cc @RagnaroWA @tdog191

Batched matrix multiplication integration

Order of including omp.h matters for compilation

Hello,

I found an annoying compilation error if I include omp.h after ctf.hpp. I am running with gcc-4.9, MVAPICH2, and cyclops. Here is a simple program to reproduce the problem. This problem goes away if omp.h is included before ctf.hpp.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <ctf.hpp>
#include <omp.h>
//using namespace CTF;
int main (int argc, char **argv)
{
 int rank, size;
 MPI_Init (&argc, &argv);
 long n = 400 * 400; 
 long m = 333;
 int nthread = omp_get_max_threads();
 CTF::World dw(MPI_COMM_WORLD);
 CTF::Matrix<double> A_MN(m, n, dw);
 CTF::Matrix<double> J_12(m, m, dw);
 CTF::Matrix<double> Q_MN(m, n, dw);
 A_MN.fill_random(0, 1);
 J_12.fill_random(0, 1);
 Q_MN["Qij"] = A_MN["Aij"] * J_12["AQ"];
  printf("\n Q_NORM: %8.8f", Q_MN.norm2());
  //  long m = 333;
     MPI_Finalize();
   return 0;
 }

make libpy fails on ubuntu

I failed to compile the libpy make target (and so python). I'm working on the python branch.
Here is what I did:

I used a fresh Ubuntu 17.10 with all updates installed
I installed
- sudo apt install build-essential
- sudo apt install openmpi-bin libopenmpi-dev
- sudo apt install libblas-dev liblapack-dev gfortran
- set python3.6 as default: sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.6 1
- and installed python packages that seemed to be required: sudo apt install cython3 python3-numpy python3-mpi4py

Now trying to build:

./configure

Checking compiler type/version... Using GNU compilers.
Checking whether __APPLE__ is defined... no.
Checking compiler (CXX)... successful.
Checking availability of C++11... successful.
Checking flags (CXXFLAGS)... successful.
Checking if OpenMP is provided... using OpenMP via -fopenmp flag.
Checking whether BLAS works... BLAS library was not specified, using -lblas (with underscores).
Checking whether to use CUDA... cuda will not be used.
A config.mk file has been created (to adjust all settings edit the config.mk file manually or rerun ./configure to create a new one).
Configure finished successfully.

-> seems ok

make

make ctflib
make[1]: Entering directory '/home/me/Downloads/pythonbranch/ctf'
make ctf -C src; 
make[2]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src'
make -C interface
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/interface'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c common.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/common.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c flop_counter.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/flop_counter.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c world.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/world.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c idx_tensor.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/idx_tensor.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c term.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/term.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c schedule.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/schedule.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c semiring.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/semiring.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c partition.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/partition.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c fun_term.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/fun_term.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c monoid.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/monoid.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c set.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/set.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/interface'
make -C shared
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/shared'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c util.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/util.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c memcontrol.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/memcontrol.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c int_timer.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/int_timer.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c model.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/model.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c init_models.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/init_models.o
mpicxx -x c++ -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c offload.cu -o /home/me/Downloads/pythonbranch/ctf/obj/offload.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/shared'
make -C tensor
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/tensor'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c untyped_tensor.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/untyped_tensor.o
untyped_tensor.cxx: In member function ‘int CTF_int::tensor::sparsify(std::function<bool(const char*)>)’:
untyped_tensor.cxx:1297:15: warning: argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
       int64_t nnz_blk_old[calc_nvirt()];
               ^~~~~~~~~~~
untyped_tensor.cxx:1297:15: note: in a call to built-in allocation function ‘void* __builtin_alloca_with_align(long unsigned int, long unsigned int)’
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c algstrct.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/algstrct.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/tensor'
make -C symmetry
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/symmetry'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sym_indices.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sym_indices.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c symmetrization.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/symmetrization.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/symmetry'
make -C mapping
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/mapping'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c mapping.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/mapping.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c distribution.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/distribution.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c topology.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/topology.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/mapping'
make -C redistribution
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/redistribution'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c redist.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/redist.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sparse_rw.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sparse_rw.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c pad.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/pad.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c nosym_transp.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/nosym_transp.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c cyclic_reshuffle.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/cyclic_reshuffle.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c glb_cyclic_reshuffle.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/glb_cyclic_reshuffle.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c dgtog_redist.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/dgtog_redist.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c dgtog_calc_cnt.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/dgtog_calc_cnt.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/redistribution'
make -C scaling
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/scaling'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c scaling.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/scaling.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sym_seq_scl.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sym_seq_scl.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c scale_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/scale_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c strp_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/strp_tsr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/scaling'
make -C summation
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/summation'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c summation.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/summation.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sym_seq_sum.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sym_seq_sum.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sum_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sum_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spr_seq_sum.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spr_seq_sum.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spsum_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spsum_tsr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/summation'
make -C contraction
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/contraction'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c contraction.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/contraction.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sym_seq_ctr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sym_seq_ctr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c ctr_offload.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/ctr_offload.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c ctr_comm.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/ctr_comm.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c ctr_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/ctr_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c ctr_2d_general.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/ctr_2d_general.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c sp_seq_ctr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/sp_seq_ctr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spctr_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spctr_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spctr_comm.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spctr_comm.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spctr_2d_general.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spctr_2d_general.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c spctr_offload.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/spctr_offload.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/contraction'
make -C sparse_formats
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/sparse_formats'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c coo.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/coo.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -c csr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj/csr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/sparse_formats'
make[2]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src'
ar -crs /home/me/Downloads/pythonbranch/ctf/lib/libctf.a /home/me/Downloads/pythonbranch/ctf/obj/*.o; 
make[1]: Leaving directory '/home/me/Downloads/pythonbranch/ctf'
make ctflibso
make[1]: Entering directory '/home/me/Downloads/pythonbranch/ctf'
make ctf -C src; 
make[2]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src'
make -C interface
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/interface'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c common.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/common.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c flop_counter.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/flop_counter.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c world.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/world.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c idx_tensor.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/idx_tensor.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c term.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/term.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c schedule.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/schedule.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c semiring.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/semiring.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c partition.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/partition.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c fun_term.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/fun_term.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c monoid.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/monoid.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c set.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/set.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/interface'
make -C shared
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/shared'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c util.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/util.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c memcontrol.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/memcontrol.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c int_timer.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/int_timer.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c model.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/model.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c init_models.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/init_models.o
mpicxx -x c++ -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c offload.cu -o /home/me/Downloads/pythonbranch/ctf/obj_shared/offload.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/shared'
make -C tensor
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/tensor'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c untyped_tensor.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/untyped_tensor.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c algstrct.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/algstrct.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/tensor'
make -C symmetry
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/symmetry'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sym_indices.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sym_indices.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c symmetrization.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/symmetrization.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/symmetry'
make -C mapping
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/mapping'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c mapping.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/mapping.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c distribution.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/distribution.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c topology.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/topology.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/mapping'
make -C redistribution
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/redistribution'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c redist.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/redist.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sparse_rw.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sparse_rw.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c pad.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/pad.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c nosym_transp.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/nosym_transp.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c cyclic_reshuffle.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/cyclic_reshuffle.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c glb_cyclic_reshuffle.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/glb_cyclic_reshuffle.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c dgtog_redist.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/dgtog_redist.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c dgtog_calc_cnt.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/dgtog_calc_cnt.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/redistribution'
make -C scaling
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/scaling'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c scaling.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/scaling.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sym_seq_scl.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sym_seq_scl.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c scale_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/scale_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c strp_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/strp_tsr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/scaling'
make -C summation
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/summation'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c summation.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/summation.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sym_seq_sum.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sym_seq_sum.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sum_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sum_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spr_seq_sum.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spr_seq_sum.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spsum_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spsum_tsr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/summation'
make -C contraction
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/contraction'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c contraction.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/contraction.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sym_seq_ctr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sym_seq_ctr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c ctr_offload.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/ctr_offload.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c ctr_comm.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/ctr_comm.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c ctr_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/ctr_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c ctr_2d_general.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/ctr_2d_general.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c sp_seq_ctr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/sp_seq_ctr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spctr_tsr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spctr_tsr.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spctr_comm.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spctr_comm.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spctr_2d_general.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spctr_2d_general.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c spctr_offload.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/spctr_offload.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/contraction'
make -C sparse_formats
make[3]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src/sparse_formats'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c coo.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/coo.o
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c csr.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_shared/csr.o
make[3]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src/sparse_formats'
make[2]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src'
make ctf_ext_objs -C src_python; 
make[2]: Entering directory '/home/me/Downloads/pythonbranch/ctf/src_python'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -c ctf_ext.cxx -o /home/me/Downloads/pythonbranch/ctf/obj_ext/ctf_ext.o
make[2]: Leaving directory '/home/me/Downloads/pythonbranch/ctf/src_python'
mpicxx -O3 -fopenmp -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1   -fPIC -shared -o /home/me/Downloads/pythonbranch/ctf/lib_shared/libctf.so /home/me/Downloads/pythonbranch/ctf/obj_shared/*.o /home/me/Downloads/pythonbranch/ctf/obj_ext/*.o; 
make[1]: Leaving directory '/home/me/Downloads/pythonbranch/ctf'

-> seems as well ok
3. first failure I got with: make pylib

cd src_python && LDFLAGS="-L../lib_shared" python setup.py build_ext --inplace  && cd ..
running build_ext
building 'ctf.core' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-sXpGnM/python3.6-3.6.3=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I../include -I. -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/include/python3.6m -c ctf/core.cpp -o build/temp.linux-x86_64-3.6/ctf/core.o -std=c++11 -O0 -g
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1788:0,
                 from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /usr/lib/python3/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from ctf/core.cpp:493:
/usr/lib/python3/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^~~~~~~
ctf/core.cpp:496:10: fatal error: mpi.h: No such file or directory
 #include "mpi.h"
          ^~~~~~~
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Makefile:109: recipe for target 'pylib' failed
make: *** [pylib] Error 1

but I could resolve this by editing src_python/setup.py and replacing there include_dirs=["../include",".",numpy.get_include()] with include_dirs=["../include",".",numpy.get_include(),"/usr/lib/x86_64-linux-gnu/openmpi/include"]
running again make pylib

cd src_python && LDFLAGS="-L../lib_shared" python setup.py build_ext --inplace  && cd ..
running build_ext
building 'ctf.core' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.6-sXpGnM/python3.6-3.6.3=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I../include -I. -I/usr/lib/python3/dist-packages/numpy/core/include -I/usr/lib/x86_64-linux-gnu/openmpi/include -I/usr/include/python3.6m -c ctf/core.cpp -o build/temp.linux-x86_64-3.6/ctf/core.o -std=c++11 -O0 -g
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarraytypes.h:1788:0,
                 from /usr/lib/python3/dist-packages/numpy/core/include/numpy/ndarrayobject.h:18,
                 from /usr/lib/python3/dist-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from ctf/core.cpp:493:
/usr/lib/python3/dist-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it by " \
  ^~~~~~~
ctf/core.cpp: In function ‘PyObject* __Pyx_PyInt_LshiftObjC(PyObject*, PyObject*, long int, int)’:
ctf/core.cpp:94940:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
                 if (unlikely(!(b < sizeof(long)*8 && a == x >> b)) && a) {
                                ~~^~~~~~~~~~~~~~~~
ctf/core.cpp:675:43: note: in definition of macro ‘unlikely’
   #define unlikely(x) __builtin_expect(!!(x), 0)
                                           ^
In file included from ../include/../src/interface/group.h:57:0,
                 from ../include/../src/interface/monoid.h:142,
                 from ../include/../src/interface/set.h:597,
                 from ../include/../src/interface/tensor.h:5,
                 from ../include/ctf.hpp:17,
                 from ctf/core.cpp:497:
../include/../src/interface/semiring.h: In instantiation of ‘dtype CTF_int::default_mul(dtype, dtype) [with dtype = bool]’:
../include/../src/interface/semiring.h:293:19:   required from ‘CTF::Semiring<dtype, is_ord>::Semiring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/ring.h:25:40:   required from ‘CTF::Ring<dtype, is_ord>::Ring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/tensor.h:208:43:   required from here
../include/../src/interface/semiring.h:13:13: warning: ‘*’ in boolean context, suggest ‘&&’ instead [-Wint-in-bool-context]
     return a*b;
            ~^~
../include/../src/interface/semiring.h: In instantiation of ‘void CTF_int::default_gemm(char, char, int, int, int, dtype, const dtype*, const dtype*, dtype, dtype*) [with dtype = bool]’:
../include/../src/interface/semiring.h:294:19:   required from ‘CTF::Semiring<dtype, is_ord>::Semiring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/ring.h:25:40:   required from ‘CTF::Ring<dtype, is_ord>::Ring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/tensor.h:208:43:   required from here
../include/../src/interface/semiring.h:98:18: warning: ‘*’ in boolean context, suggest ‘&&’ instead [-Wint-in-bool-context]
         C[j*m+i] *= beta;
         ~~~~~~~~~^~~~~~~
../include/../src/interface/semiring.h: In instantiation of ‘void CTF_int::default_scal(int, dtype, dtype*, int) [with dtype = bool]’:
../include/../src/interface/semiring.h:296:19:   required from ‘CTF::Semiring<dtype, is_ord>::Semiring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/ring.h:25:40:   required from ‘CTF::Ring<dtype, is_ord>::Ring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/tensor.h:208:43:   required from here
../include/../src/interface/semiring.h:50:17: warning: ‘*’ in boolean context, suggest ‘&&’ instead [-Wint-in-bool-context]
       X[incX*i] *= alpha;
       ~~~~~~~~~~^~~~~~~~
../include/../src/interface/semiring.h: In instantiation of ‘void CTF_int::default_coomm(int, int, int, dtype, const dtype*, const int*, const int*, int, const dtype*, dtype, dtype*) [with dtype = bool]’:
../include/../src/interface/semiring.h:297:19:   required from ‘CTF::Semiring<dtype, is_ord>::Semiring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/ring.h:25:40:   required from ‘CTF::Ring<dtype, is_ord>::Ring() [with dtype = bool; bool is_ord = true]’
../include/../src/interface/tensor.h:208:43:   required from here
../include/../src/interface/semiring.h:183:18: warning: ‘*’ in boolean context, suggest ‘&&’ instead [-Wint-in-bool-context]
         C[j*m+i] *= beta;
         ~~~~~~~~~^~~~~~~
x86_64-linux-gnu-g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -specs=/usr/share/dpkg/no-pie-link.specs -Wl,-z,relro -L../lib_shared -g -fdebug-prefix-map=/build/python3.6-sXpGnM/python3.6-3.6.3=. -specs=/usr/share/dpkg/no-pie-compile.specs -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.6/ctf/core.o -L../lib_shared -lctf -lblas -lmpicxx -o /home/me/Downloads/pythonbranch/ctf/src_python/ctf/core.cpython-36m-x86_64-linux-gnu.so -std=c++11
/usr/bin/ld: cannot find -lmpicxx
collect2: error: ld returned 1 exit status
error: command 'x86_64-linux-gnu-g++' failed with exit status 1
Makefile:109: recipe for target 'pylib' failed
make: *** [pylib] Error 1

Here I don't know what to do further. Thanks in advance for the help.

Matrix Market I/O in CTF

I'm not totally sure if this is the correct venue for CTF usage questions---if not, please redirect me.

What's the most straightforward way to load Matrix Market format sparse matrices into CTF? I'm using the C++ API, but could switch to Python.

I don't see a Matrix Market reader in the source. Do I need to use my own parallel reader to read in chunks to a CSR on each process and then initialize a Tensor with them? Which constructor should I use?

link doxygen on github front page

I recommend setting http://ctf.eecs.berkeley.edu as the home page so people that come to this page first know where the docs are.

If it's in the README, one has to scroll down to see that...

simple SVD fails with MPI

The following snippet only fails with 2+ MPI processes

int lens[3] = {1,2,1}; //needs to be this; {1,3,1} works
Tensor<float> T(3, false, lens, dw);
T.fill_random(-1.,1.);
T.print();
Tensor<float> U, S, V1;
T["ijk"].svd(U["ia"],S["a"],V1["akj"]); //hangs here...

However when reordering indices, it works

T["ijk"].svd(U["ija"],S["a"],V1["ak"]); // this works

reshape performance improvements

Reshaping tensors is currently done at python-level via sparse get_local_data and write, should optimize and handle special cases (e.g. sequential execution) efficiently when possible.

unable to scale expressions by std::complex<double>

In a contraction expression involving tensors of std::complex, it is not possible to scale the expression by a complex number. Scaling by a real number works fine, however.

Problems with automated build on OSX

apple clang does not support -Bdynamic/-Bstatic (manually fixed by removal of all -Wl,-Bdynamic and -Wl,-Bstatic from configure file)
latest openmpi does not seem to provide MPI::COMPLEX_DOUBLE (manually fixed by replacing with MPI_CXX_COMPLEX_DOUBLE)
C++11 not supported with usual flag for python build, (manually fixed by adding to extra_compile_args -stdlib=libc++ -mmacosx-version-min=10.7 as suggested in automl/auto-sklearn#304)

should automate all of these when apple/clang is detected.

do not create new nonzeros when running contraction-like Transform

The Transform that modifies one tensor based on one other tensor (generalization of summation), currently only modifies the nonzeros of the output tensor. However, the Transform that modifies a tensor based on two other tensors (generalization of contraction), currently generates nonzeros. This enhancement will enable prescreened output sparsity.

Better messages for unexpected exits

It would be great if there was some message indicated why CTF is exiting abnormally, something like:

"Out of memory" (or "posix_memalign returned XXX")
"Assertion failed on line XXX of YYY"

This would make tracking down issues much easier and help to distinguish from genuine segfaults. Also, wouldn't it be better to use MPI_Abort rather than dereference NULL?

How do I fill a local tensor with different distribution into Cyclops?

Hello everyone,

I wasn't sure if I should email someone about this problem, so I decided to post an issue.

I have been working on writing a Parallel Density-fitting Fock builder using this library. I had a previous code working with Global Arrays (GA), but I was always having a problem with allocating memory and/or other parts of the code behaving oddly. I decided to try and use Cyclops to see if I can fix these problems. Plus, I really like the sparsity handling feature, so I was trying out that feature. My code works in serial (with and without sparsity), but I am having trouble figuring out how to fill a parallel tensor object.

Anyway, I have an array that is distributed via the slowest index (mn|Q), where the Q rows are statically distributed to N processors. I was able to do this quite simply with GA as you can specify the distribution that the tensor is defined on.

In Cyclops, I just want to fill the tensor using the local data available on each processor (in my array) which does not seem to correspond to the distribution that Cyclops is in. I realize that the distribution influences performance for contractions and Parallel linear algebra calls. I tried using write(int64_t, int64_t_, double_ values) and forcing write to be the distribution that is on my processor, but this throws a segmentation fault when allocating the pairs (https://github.com/solomonik/ctf/blob/master/src/interface/tensor.cxx#L260).

Does write have to be the same dimensions of read_local? Is there another way to write to the tensor given that my local data is not in the same distribution of the tensor.

Required MPI3 support

Currently OpenMPI 1.6.5 (and possible other pre-MPI3 libs) can't compile CTF. We should clarify:

If MPI3 support is required (or what MPI3 features are required)?
What minimum version of OpenMPI/MPICH etc. is required?

(Note that OpenMPI 1.6.5 did work until recently, so perhaps full MPI3 support isn't necessarily required?)

optimize check_key_ranges

Routine is not threaded and has been observed acting as a bottleneck within sparse data read/write.

MPI_DOUBLE_COMPLEX is an "invalid datatype"

OpenMPI 1.10 on OSX gives the following error when using the MPI_DOUBLE_COMPLEX datatype (from C++):

[Devins-iMac:62206] *** An error occurred in MPI_Alltoallv
[Devins-iMac:62206] *** reported by process [1094451201,0]
[Devins-iMac:62206] *** on communicator MPI_COMM_WORLD
[Devins-iMac:62206] *** MPI_ERR_TYPE: invalid datatype
[Devins-iMac:62206] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Devins-iMac:62206] *** and potentially your MPI job)

A simple program to test this is:

#include "mpi.h"

int main(int argc, char **argv)
{
    char a[16], b[16];
    int n = 1, d = 0;
    MPI_Init(&argc, &argv);
    MPI_Alltoallv(&a, &n, &d, MPI_DOUBLE_COMPLEX,
                  &b, &n, &d, MPI_DOUBLE_COMPLEX, MPI_COMM_WORLD);
    MPI_Finalize();
    return 0;
}

In the context of CTF_int::CommData::all_to_allv it might be better to just use MPI_BYTE and scale the counts/displs accordingly.

Values of tensors not getting printed into a file

I've come accross a small but annoying issue.
When I print a tensor into a file like

FILE *fp;
fp = open("myfile.ctf");
myTensor.print(fp);

I get everything but the values of the tensor, i.e.

[0][0](0, <>)
[1][0](1, <>)
[2][0](2, <>)
[3][0](3, <>)
[4][0](4, <>)
[5][0](5, <>)
[6][0](6, <>)
[7][0](7, <>)
[8][0](8, <>)

for instance.
I went to the source code and I think it is because in the implemenation of the print method in
src/tensor/untyped_tensor.cxx in line 1799 one does not pass the fp argument into the
print mehtod of the algebraic structure sr.

Add support Hadamard indices in contractions with sparse tensors

Currently cannot do something like B["ij"]=A["ij"]B["ij"] with sparse A, B, consider instead using summation with a special operator, i.e Transform<>([](double a, double b){ b=a; })(A,B);. However, there is no such simple workaround for computations like C["ijk"]=A["ijl"]*B["ljk"], so this is a high priority enhancement. For now please do this as dense and expect a fix for the sparse case soon.

CTF build error on Mac OSX.

Trying to build ctf python on my mac. After configuring successfully, I get the following error after doing make python. I get the same error if I do make python_install. Note that I have built this library successfully on a linux machine, but this is first time trying the build on an OSX.

After compilation (which incurred a number of warnings), I skip to the following output:

27 warnings generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall  -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK   -fPIC -shared -o /Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_shared/libctf.so /Users/huttered40/Documents/phD/ResearchProjects/ctf/obj_shared/*.o /Users/huttered40/Documents/phD/ResearchProjects/ctf/obj_ext/*.o    
ld: warning: text-based stub file /System/Library/Frameworks//OpenCL.framework/Versions/A/OpenCL.tbd and library file /System/Library/Frameworks//OpenCL.framework/Versions/A/OpenCL are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//OpenGL.framework/Versions/A/OpenGL.tbd and library file /System/Library/Frameworks//OpenGL.framework/Versions/A/OpenGL are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//IOKit.framework/Versions/A/IOKit.tbd and library file /System/Library/Frameworks//IOKit.framework/Versions/A/IOKit are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CoreGraphics.framework/Versions/A/CoreGraphics.tbd and library file /System/Library/Frameworks//CoreGraphics.framework/Versions/A/CoreGraphics are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Foundation.framework/Versions/C/Foundation.tbd and library file /System/Library/Frameworks//Foundation.framework/Versions/C/Foundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGFXShared.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGFXShared.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//IOSurface.framework/Versions/A/IOSurface.tbd and library file /System/Library/Frameworks//IOSurface.framework/Versions/A/IOSurface are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/Versions/A/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/Versions/A/CoreFoundation are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CFNetwork.framework/Versions/A/CFNetwork.tbd and library file /System/Library/Frameworks//CFNetwork.framework/Versions/A/CFNetwork are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CoreServices.framework/Versions/A/CoreServices.tbd and library file /System/Library/Frameworks//CoreServices.framework/Versions/A/CoreServices are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//ApplicationServices.framework/Versions/A/ApplicationServices.tbd and library file /System/Library/Frameworks//ApplicationServices.framework/Versions/A/ApplicationServices are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//Security.framework/Versions/A/Security.tbd and library file /System/Library/Frameworks//Security.framework/Versions/A/Security are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLU.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLU.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGL.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGL.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//DiskArbitration.framework/Versions/A/DiskArbitration.tbd and library file /System/Library/Frameworks//DiskArbitration.framework/Versions/A/DiskArbitration are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLImage.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libGLImage.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/PrivateFrameworks/IOAccelerator.framework/Versions/A/IOAccelerator.tbd and library file /System/Library/PrivateFrameworks/IOAccelerator.framework/Versions/A/IOAccelerator are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSServices.framework/Versions/A/OSServices.tbd and library file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/OSServices.framework/Versions/A/OSServices are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/LaunchServices.tbd and library file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/LaunchServices.framework/Versions/A/LaunchServices are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SharedFileList.framework/Versions/A/SharedFileList.tbd and library file /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/SharedFileList.framework/Versions/A/SharedFileList are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CoreText.framework/Versions/A/CoreText.tbd and library file /System/Library/Frameworks//CoreText.framework/Versions/A/CoreText are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//ImageIO.framework/Versions/A/ImageIO.tbd and library file /System/Library/Frameworks//ImageIO.framework/Versions/A/ImageIO are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/ATS.tbd and library file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/ATS are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/HIServices.framework/Versions/A/HIServices.tbd and library file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/HIServices.framework/Versions/A/HIServices are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/SpeechSynthesis.framework/Versions/A/SpeechSynthesis.tbd and library file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/SpeechSynthesis.framework/Versions/A/SpeechSynthesis are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCVMSPluginSupport.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCVMSPluginSupport.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenDirectory.framework/Versions/A/Frameworks/CFOpenDirectory.framework/Versions/A/CFOpenDirectory.tbd and library file /System/Library/Frameworks/OpenDirectory.framework/Versions/A/Frameworks/CFOpenDirectory.framework/Versions/A/CFOpenDirectory are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCoreVMClient.tbd and library file /System/Library/Frameworks/OpenGL.framework/Versions/A/Libraries/libCoreVMClient.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJPEG.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJPEG.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libTIFF.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libTIFF.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libPng.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libPng.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libGIF.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libGIF.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJP2.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libJP2.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libRadiance.tbd and library file /System/Library/Frameworks/ImageIO.framework/Versions/A/Resources/libRadiance.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//CoreAudio.framework/Versions/A/CoreAudio.tbd and library file /System/Library/Frameworks//CoreAudio.framework/Versions/A/CoreAudio are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//AudioToolbox.framework/Versions/A/AudioToolbox.tbd and library file /System/Library/Frameworks//AudioToolbox.framework/Versions/A/AudioToolbox are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks//ServiceManagement.framework/Versions/A/ServiceManagement.tbd and library file /System/Library/Frameworks//ServiceManagement.framework/Versions/A/ServiceManagement are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/libFontParser.tbd and library file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/libFontParser.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/libFontRegistry.tbd and library file /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/libFontRegistry.dylib are out of sync. Falling back to library file for linking.
ld: warning: text-based stub file /System/Library/PrivateFrameworks/APFS.framework/Versions/A/APFS.tbd and library file /System/Library/PrivateFrameworks/APFS.framework/Versions/A/APFS are out of sync. Falling back to library file for linking.
cd src_python; \
	ln -sf /Users/huttered40/Documents/phD/ResearchProjects/ctf/setup.py setup.py; \
	mkdir -p /Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_python/ctf && cp ctf/__init__.py /Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_python/ctf/; \
	LDFLAGS="-L/Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_shared" python setup.py build_ext --force -b /Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_python/ -t /Users/huttered40/Documents/phD/ResearchProjects/ctf/lib_python/; \
	rm setup.py; \
	cd ..;
Compiling ctf/core.pyx because it changed.
Compiling ctf/random.pyx because it changed.
Cythonizing ctf/core.pyx

Error compiling Cython file:
------------------------------------------------------------
...
        World()
        World(int)

    cdef cppclass Idx_Tensor(Term):
        Idx_Tensor(ctensor *, char *);
        void operator=(Term B);
                     ^
------------------------------------------------------------

ctf/core.pyx:223:22: Overloading operator '=' not yet supported.

Error compiling Cython file:
------------------------------------------------------------
...
        World(int)

    cdef cppclass Idx_Tensor(Term):
        Idx_Tensor(ctensor *, char *);
        void operator=(Term B);
        void operator=(Idx_Tensor B);
                     ^
------------------------------------------------------------

ctf/core.pyx:224:22: Overloading operator '=' not yet supported.

Error compiling Cython file:
------------------------------------------------------------
...
        void operator=(Idx_Tensor B);
        void multeq(double scl);

    cdef cppclass Typ_Idx_Tensor[dtype](Idx_Tensor):
        Typ_Idx_Tensor(ctensor *, char *)
        void operator=(Term B)
                     ^
------------------------------------------------------------

ctf/core.pyx:229:22: Overloading operator '=' not yet supported.

Error compiling Cython file:
------------------------------------------------------------
...
        void multeq(double scl);

    cdef cppclass Typ_Idx_Tensor[dtype](Idx_Tensor):
        Typ_Idx_Tensor(ctensor *, char *)
        void operator=(Term B)
        void operator=(Idx_Tensor B)
                     ^
------------------------------------------------------------

ctf/core.pyx:230:22: Overloading operator '=' not yet supported.

Traceback (most recent call last):
  File "setup.py", line 27, in <module>
    setup(name="CTF",packages=["ctf"],version="1.5.4",cmdclass = {'build_ext': build_ext},ext_modules = cythonize(ext_mods))
  File "/Users/huttered40/anaconda/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 825, in cythonize
    cythonize_one(*args[1:])
  File "/Users/huttered40/anaconda/lib/python2.7/site-packages/Cython/Build/Dependencies.py", line 944, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: ctf/core.pyx

Generated config.mk:

### INSTALLATION TARGET DIRECTORY (for make install)
INSTALL_DIR = /usr/local/

### LINK TIME LIBRARIES AND FLAGS
#libraries and flags for link time (irrelevant if only building CTF lib and not examples/tests)
LIB_PATH     =
LIB_FILES    =
LINKFLAGS    =
LD_LIB_PATH  =
SO_LIB_PATH  =
SO_LIB_FILES =
LDFLAGS      =


### COMPILE TIME INCLUDES AND FLAGS
#C++ compiler 
CXX         = mpicxx
#includes for compile time
INCLUDES    =
#optimization flags, some intel compiler versions may run into errors when using -fast or -ipo
CXXFLAGS    = -O3 -std=c++0x -DOMP_OFF -Wall
#command to make library out of object files
AR          = ar

#macros to be defined throughout the code, use the below in combination with appropriate external libraries
#Include in DEFS -DUSE_LAPACK to build with LAPACK functionality, 
#                -DUSE_SCALAPACK to build with SCALAPACK functionality
#                -DUSE_BATCH_GEMM to build {without, with} batched BLAS routines
#                -DUSE_MKL to build with MKL sparse matrix kernels
#                -DUSE_HPTT to build with optimized tensor transposition routines from HPTT library
DEFS        = -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK


### Optional: PROFILING AND TUNING
#uncomment below to enable performance profiling
#DEFS       += -DPROFILE -DPMPI
#uncomment below to enable automatic performance tuning (loses reproducibility of results)
#recommended usage is to run model_trainer with -DTUNE at scale for suitable duration to obtain suitable architecutral parameters, 
#then recompile with parameters without -DTUNE
#Note: -DTUNE requires lapack (include -mkl or -llapack in LIBS) and the inclusion of above performance profiling flags
#DEFS       += -DTUNE

### Optional: DEBUGGING AND VERBOSITY
#uncomment below to enable CTF execution output (1 for basic contraction information on start-up and contractions)
#DEFS       += -DVERBOSE=1
#uncomment to set debug level to dump information about mapping and internal CTF actions and activate asserts
#DEFS       += -DDEBUG=1

### FULL COMPILE COMMAND AND LIBRARIES
#used to compile all plain C++ files
FCXX        = $(CXX) $(CXXFLAGS) $(DEFS) $(INCLUDES)
#link-line for all executables
LIBS        = $(LIB_PATH) $(LIB_FILES) $(LINKFLAGS)
#compiler for CUDA files (used to compile CUDA code only when -DOFFLOAD and -DUSE_CUDA are in DEFS, otherwise should be same as FCXX with -x c++)
OFFLOAD_CXX = $(CXX) -x c++ $(CXXFLAGS) $(DEFS) $(INCLUDES)

Please advise on how to achieve a successful build.

Assertion failure in readwrite

Attached program hits an assertion:

src/redistribution/sparse_rw.cxx:708 via
src/redistribution/sparse_rw.cxx:985 via
src/tensor/untyped_tensor.cxx:1052

Assertion fails on 48 cores, works on 1 core.

test.tar.gz

Testing 3D DFT with n = 6: make: *** [test] Segmentation fault: 11

Is this failure expected?

agraback-mobl1:ctf jrhammon$ cat how-did-i-configure 
./configure 'CXX=/opt/mpich/dev/clang/default/bin/mpicxx' '--blas=-framework Accelerate'

agraback-mobl1:ctf jrhammon$ make test
/Applications/Xcode.app/Contents/Developer/usr/bin/make test_suite -C test
make[1]: Nothing to be done for `test_suite'.
/Users/jrhammon/Work/CHEMISTRY/AQUARIUS/ctf/bin/test_suite
Testing Cyclops Tensor Framework using 1 processors
Testing non-symmetric: NS = NS*NS weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing symmetric: SY = SY*SY weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing (anti-)skew-symmetric: AS = AS*AS weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing symmetric-hollow: SH = SH*SH weigh with n = 6:
{ C["ijkl"] = A["ijkl"]*B["ijkl"] } passed
Testing CCSDT T3->T2 with n= 6, m = 7:
{ AS_C["abij"] += 0.5*AS_A["mnje"]*AS_B["abeimn"] } passed
Testing non-symmetric: NS = NS*NS matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 0 sp 1.000000), B (36*36 sym 0 sp 1.000000), C (36*36 sym 0 sp 1.000000) } passed 
Testing symmetric: SY = SY*SY matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 1 sp 1.000000), B (36*36 sym 1 sp 1.000000), C (36*36 sym 1 sp 1.000000) } passed 
Testing (anti-)skew-symmetric: AS = AS*AS matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 2 sp 1.000000), B (36*36 sym 2 sp 1.000000), C (36*36 sym 2 sp 1.000000) } passed 
Testing symmetric-hollow: SH = SH*SH matmul with n = 36:
{ C["ik"] += A["ij"]*B["jk"] with A (36*36 sym 3 sp 1.000000), B (36*36 sym 3 sp 1.000000), C (36*36 sym 3 sp 1.000000) } passed 
Testing non-symmetric: NS = NS*NS 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing symmetric: SY = SY*SY 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing (anti-)skew-symmetric: AS = AS*AS 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing symmetric-hollow: SH = SH*SH 4D gemm with n = 6:
{ (A["ijmn"]*B["mnpq"])*C["pqkl"] = A["ijmn"]*(B["mnpq"]*C["pqkl"]) } passed
Testing scalar operations
{ scalar tests } passed
Testing a 2D trace operation with n = 6:
{ tr(ABCD) = tr(DABC) = tr(CDAB) = tr(BCDA) } passed
Testing a diag sym operation with n = 6:
{ (A["(ab)(ij)"]=mA["ii"]-mB["aa"]=mA["jj"]-mB["bb"] } passed 
Testing a diag ctr operation with n = 6 m = 36:
{ sum(ai)A["aiai"]=sum(ai)mA["ai"] } passed 
Testing fast symmetric multiplication operation with n = 36:
{ C["(ij)"] = A["(ik)"]*B["(kj)"] } passed
Testing 4D fast symmetric contraction operation with n = 6:
{ C["(ij)ab"] = A["(ik)al"]*B["(kj)lb"] } passed
Testing multi-tensor symmetric contraction with m = 36 n = 6:
{ A["ik"]*A["jk"] = C_NS["ij"] = C_AS["ij"] } passed.
Testing gemm on subworld algorithm with n,m,k = 36 div = 3:
{ GEMM on subworlds } passed
Testing non-symmetric Strassen's algorithm with n = 72:
{ Strassen's algorithm via slicing } passed
Testing diagonal write with n = 6:
{ diagonal write test } passed
Testing readall test with n = 6 m = 36:
{ sum(ai)A["aiai"]=sum(ai)mA["ai"] } passed 
Testing repack with n = 6:
{ NS -> SY -> NS repack } passed 
Testing SY times NS with n = 6:
{ C["(ij)"]=A["(ij)"]*B["ijkl"] } passed 
Testing non-symmetric sliced GEMM algorithm with (16 32 8):
{ GEMM with parallel slicing } passed
Testing 1D DFT with n = 36:
{ DFT["ik"] = DFT["ij"]*IDFT["jk"] } passed
Testing 3D DFT with n = 6:
make: *** [test] Segmentation fault: 11

Implement tril() and triu() for CTF matrices

cc @RagnaroWA

Bug: taking powers of scalers returns zero

Taking powers of a tensor works as expected, unless the tensor contains a single scaler. Then taking powers returns zero.

>>> import ctf
>>> a = ctf.astensor([2])
>>> a
array([2])
>>> a ** 2
array([4])
>>> b = ctf.astensor(2)
>>> b
array(2)
>>> b ** 2
array(0)

account for GPU memory capacity

Currently, only available memory in DRAM is accounted for and not the on-device memory usage.

support for shadow execution

Add functionality to estimate cost and memory usage for a set of tensor operations without actually executing them.

CTF failed to build on MacOS. Error: variable has incomplete type 'struct mallinfo'

Hi all, I tried building CTF on my macbook and it seems to throw out some error due to forward declaration of 'mallinfo' in memcontrol.cxx under shared. I pasted the error message for reference.

atanteck@Adrians-MacBook-Pro:/Desktop/cyclops/ctf$ make
/Library/Developer/CommandLineTools/usr/bin/make ctflib
/Library/Developer/CommandLineTools/usr/bin/make ctf -C src;
/Library/Developer/CommandLineTools/usr/bin/make -C interface
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c common.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/common.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c flop_counter.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/flop_counter.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c world.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/world.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c idx_tensor.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/idx_tensor.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c term.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/term.o
In file included from term.cxx:4:
In file included from ./idx_tensor.h:4:
In file included from ./../scaling/../interface/term.h:4:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/map:442:
/Library/Developer/CommandLineTools/usr/include/c++/v1/__tree:1819:22: warning:
the specified comparator type does not provide a const call operator
[-Wuser-defined-warnings]
__trigger_diagnostics()), "");
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/set:400:28: note: in
instantiation of member function 'std::__1::__tree<CTF::Idx_Tensor *,
CTF_int::tensor_name_less, std::__1::allocator<CTF::Idx_Tensor *>
>::__tree' requested here
class _LIBCPP_TEMPLATE_VIS set
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/__tree:970:7: note: from
'diagnose_if' attribute on '__trigger_diagnostics':
..._LIBCPP_DIAGNOSE_WARNING(!__invokable<_Compare const&, _Tp const&, _Tp const&>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Library/Developer/CommandLineTools/usr/include/c++/v1/__config:1130:20: note:
expanded from macro '_LIBCPP_DIAGNOSE_WARNING'
attribute((diagnose_if(VA_ARGS, "warning")))
^ ~~~~~~~~~~~
1 warning generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c schedule.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/schedule.o
In file included from schedule.cxx:2:
In file included from ./schedule.h:5:
In file included from ./idx_tensor.h:4:
In file included from ./../scaling/../interface/term.h:4:
In file included from /Library/Developer/CommandLineTools/usr/include/c++/v1/map:442:
/Library/Developer/CommandLineTools/usr/include/c++/v1/__tree:1819:22: warning:
the specified comparator type does not provide a const call operator
[-Wuser-defined-warnings]
__trigger_diagnostics()), "");
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/set:400:28: note: in
instantiation of member function 'std::__1::__tree<CTF::Idx_Tensor ,
CTF_int::tensor_name_less, std::__1::allocator<CTF::Idx_Tensor >
>::~__tree' requested here
class _LIBCPP_TEMPLATE_VIS set
^
/Library/Developer/CommandLineTools/usr/include/c++/v1/__tree:970:7: note: from
'diagnose_if' attribute on '__trigger_diagnostics':
..._LIBCPP_DIAGNOSE_WARNING(!__invokable<_Compare const&, _Tp const&, _Tp const&>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Library/Developer/CommandLineTools/usr/include/c++/v1/__config:1130:20: note:
expanded from macro '_LIBCPP_DIAGNOSE_WARNING'
attribute((diagnose_if(VA_ARGS, "warning")))
^ ~~~~~~~~~~~
1 warning generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c semiring.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/semiring.o
In file included from semiring.cxx:1:
./set.h:797:22: warning: format specifies type 'long' but the argument has type
'int64_t' (aka 'long long') [-Wformat]
fprintf(fp,"%ld",((int64_t)a)[0]);
~~~ ^~~~~~~~~~~~~~~~
%lld
./set.h:802:22: warning: format specifies type 'unsigned long' but the argument
has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
fprintf(fp,"%lu",((uint64_t)a)[0]);
~~~ ^~~~~~~~~~~~~~~~~
%llu
2 warnings generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c partition.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/partition.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c fun_term.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/fun_term.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c monoid.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/monoid.o
In file included from monoid.cxx:2:
./set.h:797:22: warning: format specifies type 'long' but the argument has type
'int64_t' (aka 'long long') [-Wformat]
fprintf(fp,"%ld",((int64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~
%lld
./set.h:802:22: warning: format specifies type 'unsigned long' but the argument
has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
fprintf(fp,"%lu",((uint64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~~
%llu
2 warnings generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c set.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/set.o
In file included from set.cxx:1:
./set.h:797:22: warning: format specifies type 'long' but the argument has type
'int64_t' (aka 'long long') [-Wformat]
fprintf(fp,"%ld",((int64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~
%lld
./set.h:802:22: warning: format specifies type 'unsigned long' but the argument
has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
fprintf(fp,"%lu",((uint64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~~
%llu
2 warnings generated.
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c ring.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/ring.o
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:5:
./set.h:797:22: warning: format specifies type 'long' but the argument has type
'int64_t' (aka 'long long') [-Wformat]
fprintf(fp,"%ld",((int64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~
%lld
./set.h:802:22: warning: format specifies type 'unsigned long' but the argument
has type 'uint64_t' (aka 'unsigned long long') [-Wformat]
fprintf(fp,"%lu",((uint64_t*)a)[0]);
~~~ ^~~~~~~~~~~~~~~~~
%llu
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:56:37: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long ') [-Wformat]
sscanf(lvals[i]+ptr, "%ld%n", ind+0, &aptr);
~~~ ^~~~~
%lld
./tensor.cxx:906:14: note: in instantiation of function template specialization
'CTF_int::parse_sparse_tensor_data' requested here
CTF_int::parse_sparse_tensor_data(datastr, T->order, (dtype)...
^
./tensor.cxx:919:5: note: in instantiation of function template specialization
'CTF::read_sparse_from_file_base' requested here
read_sparse_from_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:59:40: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long *') [-Wformat]
sscanf(lvals[i]+ptr, " %ld%n", ind+j, &aptr);
~~~ ^~~~~
%lld
./graph_io_aux.cxx:56:37: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long ') [-Wformat]
sscanf(lvals[i]+ptr, "%ld%n", ind+0, &aptr);
~~~ ^~~~~
%lld
./tensor.cxx:906:14: note: in instantiation of function template specialization
'CTF_int::parse_sparse_tensor_data' requested here
CTF_int::parse_sparse_tensor_data(datastr, T->order, (dtype)...
^
./tensor.cxx:924:5: note: in instantiation of function template specialization
'CTF::read_sparse_from_file_base' requested here
read_sparse_from_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:59:40: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long *') [-Wformat]
sscanf(lvals[i]+ptr, " %ld%n", ind+j, &aptr);
~~~ ^~~~~
%lld
./graph_io_aux.cxx:56:37: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long ') [-Wformat]
sscanf(lvals[i]+ptr, "%ld%n", ind+0, &aptr);
~~~ ^~~~~
%lld
./tensor.cxx:906:14: note: in instantiation of function template specialization
'CTF_int::parse_sparse_tensor_data' requested here
CTF_int::parse_sparse_tensor_data(datastr, T->order, (dtype)...
^
./tensor.cxx:929:5: note: in instantiation of function template specialization
'CTF::read_sparse_from_file_base' requested here
read_sparse_from_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:59:40: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long *') [-Wformat]
sscanf(lvals[i]+ptr, " %ld%n", ind+j, &aptr);
~~~ ^~~~~
%lld
./graph_io_aux.cxx:56:37: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long ') [-Wformat]
sscanf(lvals[i]+ptr, "%ld%n", ind+0, &aptr);
~~~ ^~~~~
%lld
./tensor.cxx:906:14: note: in instantiation of function template specialization
'CTF_int::parse_sparse_tensor_data' requested here
CTF_int::parse_sparse_tensor_data(datastr, T->order, (dtype)...
^
./tensor.cxx:934:5: note: in instantiation of function template specialization
'CTF::read_sparse_from_file_base' requested here
read_sparse_from_file_base<int64_t>(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:59:40: warning: format specifies type 'long *' but the
argument has type 'int64_t *' (aka 'long long *') [-Wformat]
sscanf(lvals[i]+ptr, " %ld%n", ind+j, &aptr);
~~~ ^~~~~
%lld
./graph_io_aux.cxx:114:44: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./tensor.cxx:945:31: note: in instantiation of function template specialization
'CTF_int::serialize_sparse_tensor_data' requested here
char * datastr = CTF_int::serialize_sparse_tensor_data(T->ord...
^
./tensor.cxx:953:5: note: in instantiation of function template specialization
'CTF::write_sparse_to_file_base' requested here
write_sparse_to_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:116:47: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:140:50: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:142:53: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:114:44: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./tensor.cxx:945:31: note: in instantiation of function template specialization
'CTF_int::serialize_sparse_tensor_data' requested here
char * datastr = CTF_int::serialize_sparse_tensor_data(T->ord...
^
./tensor.cxx:958:5: note: in instantiation of function template specialization
'CTF::write_sparse_to_file_base' requested here
write_sparse_to_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:116:47: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:140:50: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:142:53: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:114:44: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./tensor.cxx:945:31: note: in instantiation of function template specialization
'CTF_int::serialize_sparse_tensor_data' requested here
char * datastr = CTF_int::serialize_sparse_tensor_data(T->ord...
^
./tensor.cxx:963:5: note: in instantiation of function template specialization
'CTF::write_sparse_to_file_base' requested here
write_sparse_to_file_base(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:116:47: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:140:50: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:142:53: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:114:44: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./tensor.cxx:945:31: note: in instantiation of function template specialization
'CTF_int::serialize_sparse_tensor_data' requested here
char * datastr = CTF_int::serialize_sparse_tensor_data(T->ord...
^
./tensor.cxx:968:5: note: in instantiation of function template specialization
'CTF::write_sparse_to_file_base' requested here
write_sparse_to_file_base<int64_t>(fpath, with_vals, rev_order, this);
^
In file included from ring.cxx:1:
In file included from ./../../include/ctf.hpp:17:
In file included from ./../../include/../src/interface/tensor.h:1287:
./graph_io_aux.cxx:116:47: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
astr_len += snprintf(NULL, 0, " %ld", ind[j]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:140:50: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, "%ld", ind[0]);
~~~ ^~~~~~
%lld
./graph_io_aux.cxx:142:53: warning: format specifies type 'long' but the
argument has type 'int64_t' (aka 'long long') [-Wformat]
str_ptr += sprintf(datastr+str_ptr, " %ld", ind[j]);
~~~ ^~~~~~
%lld
26 warnings generated.
/Library/Developer/CommandLineTools/usr/bin/make -C shared
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c util.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/util.o
mpicxx -O3 -std=c++0x -DOMP_OFF -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -D_DARWIN_C_SOURCE -DFTN_UNDERSCORE=1 -DUSE_LAPACK -c memcontrol.cxx -o /Users/atanteck/Desktop/cyclops/ctf/obj/memcontrol.o
memcontrol.cxx:216:24: warning: format specifies type 'long' but the argument
has type 'int64_t' (aka 'long long') [-Wformat]
i, mem_used[i]);
^~~~~~~~~~~
memcontrol.cxx:223:31: warning: format specifies type 'long' but the argument
has type 'int64_t' (aka 'long long') [-Wformat]
mst.size(), mst_buffer_ptr);
^~~~~~~~~~~~~~
memcontrol.cxx:602:21: error: variable has incomplete type 'struct mallinfo'
struct mallinfo info;
^
memcontrol.cxx:602:12: note: forward declaration of 'mallinfo'
struct mallinfo info;
^
memcontrol.cxx:603:12: error: invalid use of incomplete type 'mallinfo'
info = mallinfo();
^~~~~~~~~~
memcontrol.cxx:602:12: note: forward declaration of 'mallinfo'
struct mallinfo info;
^
2 warnings and 2 errors generated.
make[3]: *** [/Users/atanteck/Desktop/cyclops/ctf/obj/memcontrol.o] Error 1
make[2]: *** [shared] Error 2
make[1]: *** [ctf_objs] Error 2
make: *** [/Users/atanteck/Desktop/cyclops/ctf/lib/libctf.a] Error 2

fix Clang in Travis

See https://travis-ci.org/solomonik/ctf/jobs/275642887 for details.

Implement batched sparse CSR contractions for Hadamard indices in tensor contractions

Hadamard indices in contractions (index appearing in A, B, and C all at once) are currently fully supported, but slow in the presence of sparsity (see commit 44c09cb). This enhancement would improve performance for the sparse cases significantly, due to overhead of CSR format for current method.

Issues with tensor scaling in summation and scalar*scalar

Some problems regarding treatment of scaling factors need to be addressed

when performing sums like scalarA+B, scalar A is current treated as a contraction
it seems scalar * scalar is not working correctly in parallel, possibly only with sparsity
the former causes issues for the python interface, e.g. C= -2*A+B does not work in parallel

sparse tensor in ctf.einsum

For some contraction patterns, ctf.einsum crashes when operands are sparse tensors

n = 40
a1 = ctf.tensor((n,n,n), sp=1)
b1 = ctf.tensor((n,n,n), sp=1)
c1 = ctf.tensor((n,n,n))
a1.fill_sp_random(0., 1., 0.001)
b1.fill_sp_random(0., 1., 0.001)
c1.fill_sp_random(0., 1., 0.001)

ctf.einsum('ijk,jkl->ijkl', a1, b1)
ctf.einsum('ijk,jkl->ijkl', a1, c1)

NERSC Cori is undetected by configure script

Attempting to configure CTF on cori.nersc.gov returns:

> ./configure
Checking hostname... Hostname not recognized, assuming generic Linux host.
Checking compiler type/version... ./configure: line 233: mpicxx: command not found
./configure: line 234: mpicxx: command not found
Could not determine underlying C/C++ compilers.

Hacking configure to replace "edison" with "cori" seems to work.

Compile error on BG/Q

Compile fails on BG/Q due to a dangling "else" in src/mapping/topology.cxx on line 144. It looks like the else should be deleted, but it is not clear if that is the intent of the code.

The compiler used to build the code is BG Clang 3.7.

The build output is given below.

make -C mapping
make[3]: Entering directory `/gpfs/mira-home/justusc/src/ctf/src/mapping'
mpic++11 -O3 -std=c++11 -fopenmp   -DBGQ -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1  -I/soft/libraries/essl/current/include -c mapping.cxx
mpic++11 -O3 -std=c++11 -fopenmp   -DBGQ -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1  -I/soft/libraries/essl/current/include -c distribution.cxx
mpic++11 -O3 -std=c++11 -fopenmp   -DBGQ -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1  -I/soft/libraries/essl/current/include -c topology.cxx
topology.cxx:154:5: error: expected statement
    } else if (mach == TOPOLOGY_BGP) {
    ^
1 error generated.
make[3]: *** [topology.o] Error 1
make[3]: Leaving directory `/gpfs/mira-home/justusc/src/ctf/src/mapping'
make[2]: *** [mapping] Error 2
make[2]: Leaving directory `/gpfs/mira-home/justusc/src/ctf/src'
make[1]: *** [ctf] Error 2
make[1]: Leaving directory `/gpfs/mira-home/justusc/src/ctf'
make: *** [lib/libctf.a] Error 2
111

sparse error

import ctf,time,random
import numpy as np
import matplotlib.pyplot as plt
import numpy.linalg as la
from ctf import random as crandom
glob_comm = ctf.comm()

size_lb = 20
size_ub = 30
sparsity = 0.1
I = random.randint(size_lb,size_ub)
J = random.randint(size_lb,size_ub)
K = random.randint(size_lb,size_ub)

t1 = ctf.tensor((I, J, K), sp=True)
#t2 = ctf.tensor((I, J, K), sp=True)
#t3 = ctf.tensor((I, J, K), sp=True)

#t1 = ctf.tensor((I, J, K))
t2 = ctf.tensor((I, J, K))
t3 = ctf.tensor((I, J, K))

t1.fill_sp_random(0,1,sparsity)
#t2.fill_sp_random(0,1,sparsity)
#t1.fill_random(0, 1)
t2.fill_random(0, 1)
t3.i("ijk") << t1.i("ijk") * t2.i("ijk")
print (ctf.vecnorm(t3))

Python interface error on sparse tensor einsum

When using ctf.einsum for sparse tensor (tensor sp=1)

n = 11
a1 = ctf.tensor((n,n,n), sp=1)

Sometimes ctf.einsum works well, but sometimes there are different errors generated.
This issue possibly appear only on macOS.
Possible errors:

Checksum error

.python(98705,0x7fff8b2d9380) malloc: *** error for object 0x7f8f0fc17cd0: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap:

The resulting tensor is incorrect. (e.g. by running the test_sparse.py in test/python)

.F...
======================================================================
FAIL: test_einsum_hadamard (__main__.KnowValues)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_sparse.py", line 32, in test_einsum_hadamard
    self.assertTrue(allclose(d2,e2))
AssertionError: False is not true

Stuck by the test_sparse.py in test/python, (need to quit manually)

Tests for sparse functionality
.Killed: 9

Seg fault error

.Segmentation fault: 11

read_local memory usage

read_local uses 7X * tensor_size (assuming type double and separate index/value arrays) or so of memory, implement streaming as in read/write to lower this to 3X * tensor_size (minimal: original data + key/value pairs).

Fix SVD for complex datatypes

Need to treat singular values as non-complex types.

-std=c++11 missing with clang on OSX

The -std=c++11 flag is missing in config.mk, even though C++11 is checked and passes during configure. Specs:

OSX 10.11
clang 3.5.0 (homebrew)
openmpi 1.10

Support for SVD or HOSVD?

Has CTF implemented SVD or Higher-Order SVD method for either dense or sparse tensor ?

Hermitian tensors

Add native CTF support for Hermitian tensors.

segmentation fault in contraction with many indices

As a test for writing more understandable code for so-called quark line contractions for the construction of observables in lattice QCD, I have attempted to use Cyclops.

In the following code I unfortunately encounter a segmentation fault, however:
https://github.com/kostrzewa/nyom/blob/master/benchmarks/bench_ctf/bench_ctf.cpp

It's contraction of two six-index objects (indices run over different ranges) with a number of two-index objects, resulting in a two-index object (a 3x3 complex matrix).

If you have some time, I would really appreciate if you could see if I'm doing something wrong which might cause the segfault. The code should compile straightforwardly with the included Makefile.

Performance of Simple Program

In the past, I've used MATLAB to do some tensor calculations, My tensors are starting to get too big for single node memory. I heard about CTF and decided that I would try it. I wrote a very simple piece of code which does one of my tensor operations. I started with random data just to gauge the performance / memory usage. The full code is below. The important part is the operation line:

A["ijkl"] = L["ik"]*I["ik"]*O["ikl"]*M["lj"];

My MATLAB program does this in ~1s (using bsxfun). During that short time, it uses multiple threads (maybe 2-4). The CTF code below takes ~45s using mpiexec -np 4 ./simple_test with gcc and about the same using intel and mkl. I used the same flags that were auto generated by the ./configure script. I suspect that I haven't written optimal CTF code. Do you have any suggestions about how to optimize this code?

#include <ctf.hpp>
using namespace CTF;

int main(int argc, char ** argv){
  int np, rank;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &np);

  World dw(argc, argv);

  int n_theta = 968, n_w = 7740, n_e = 3, n_E = 16;

  Matrix<double> L(n_theta,n_w,dw);
  Matrix<double> I(n_theta,n_w,dw);
  Matrix<double> M(n_e,n_E,dw);
  int shapeLIO[] = {NS, NS, NS};
  int sizeLIO[] = {n_theta, n_w, n_e};
  Tensor<double> O(3, sizeLIO, shapeLIO, dw);

  int shapeA2[] = {NS, NS, NS, NS};
  int sizeA2[] = {n_theta, n_E, n_w, n_e};
  Tensor<double> A(4, sizeA2, shapeA2, dw);

  srand48(dw.rank);
  O.fill_random(0.0,1.0);
  I.fill_random(0.0,1.0);
  M.fill_random(0.0,1.0);
  L.fill_random(0.0,1.0);

  A["ijkl"] = L["ik"]*I["ik"]*O["ikl"]*M["lj"];

  MPI_Finalize();
}

Travis CI support

I have been setting up Travis CI for a number of other projects and wonder if we should do this for CTF as well.

It's pretty easy to do. MADNESS is the closest project in terms of dependencies that I have touched. See https://github.com/m-a-d-n-e-s-s/madness/blob/master/.travis.yml and https://github.com/m-a-d-n-e-s-s/madness/tree/master/ci for examples. I prefer to script by dependencies rather than OS (e.g. https://github.com/ParRes/Kernels/tree/master/travis), but it really doesn't matter.

I can setup the build part, but you guys will need to create useful regression tests that make sense to run inside an AWS container with 2 cores and 4 GB of memory.

The other step is that @solomonik needs to sign up with Travis using Github credentials and flip the switch to enable Travis. Make sure to set "Build only if .travis.yml is present" via https://travis-ci.org/solomonik/ctf/settings after you've signed up and activated Travis for CTF at https://travis-ci.org/profile/solomonik.

Travis supports GCC and Clang easily enough, but Intel compiler support may be available soon via this project. I still need to figure out the encryption steps for the license key, but I'll provide that when necessary.

Once we do this for CTF, it makes sense to do it for AQUARIUS...

parallel random number generation

(continuing discussion from https://github.com/solomonik/ctf/issues/27#issuecomment-251461600 in a better place)

(Sorry for the off-topic discussion.) CTF currently requires C++11. I am happy to learn C++11 has more robust support for RNGs and will keep that in mind. However, I am not sure they provide a solution to the underlying problem with parallel RNG generation for a library like CTF. One can just generate non-deterministically, but determinism is extremely valuable for reproducible testing. However, then one has to insure that different processors get a different seed and different tensors get a different seed, and it becomes not so clear how to do that (e.g. on a CTF instance basis or via global counter) and how to let the user control it.

I guess the right solution might be to have fill_random_nondeterministic and fill_random_preseeded. Then it remains to figure out whether fill_random_preseeded should have all processor ranks with the same seed or with a different seed, and how to handle the former, which is where C++11 RNGs may help. And what fill_random should do for backwards comp.

I recall @poulson worked on this w.r.t. Elemental a while back but I cannot find the emails about it. However, it had something to do with Random123 (see this and this for details). I can't tell if this is implemented in https://github.com/elemental/Elemental/tree/master/include/El/core/random though.

Transform crash under MPI

I had trouble with the following snippet while trying to do element wise multiplication:

    vector<int> inds = {1,1};
    CTF::Tensor<> A(2,inds.data());
    A["ij"] = 1.;
    A.print();
    CTF::Tensor<> S(1,inds.data());
    S["i"] = 2.;
    S.print();
    CTF::Transform<double,double>([](double s, double& a){ a*=s; })(S["j"],A["ij"]); //!!
    A.print();

This runs fine with one MPI thread but with 2+ segfaults. Swapping S["j"] -> S["i"] works but the tensor isn't always in that format.

The easy workaround for this (I assume) is to instead replace that line with A["ij"]=S["i"]*A["ij"];

Lambdas with GCC 4.7.3

Possible issue with CTF lambda linking on gcc 4.7.3.

mpicxx -std=c++0x -x c++ -O3 -fopenmp -Wall -D_POSIX_C_SOURCE=200112L -D__STDC_LIMIT_MACROS -DFTN_UNDERSCORE=1 -c ../examples/btwn_central_kernels.cxx -o /root/ctf/obj/btwn_central_kernels.o -I../include/

call of overloaded ‘Transform(CTF::fill_sp_random_base(dtype, dtype, double, CTF::Tensor&) [with dtype = double]::<lambda(double&)>)’ is ambiguous
../include/../src/interface/tensor.cxx:784:5

ravel method in python wrapper

tensor.ravel method returns only the local data of current processor. It may be better to have another method to manipulate the local/private data, and make ravel method equivalent to .reshape(-1) method.

Set::abs not set in some instances

In some instances (in particular, when calling Tensor::reduce(OP_MAXABS)), the abs field of Monoid mmax (src/tensor.cxx:650) is NULL, which leads to a segfault in seq_tsr_sum::run. It seems that this is initially set to NULL in the default Set constructor, and not later initialized with Set::set_abs_to_default()? This function does not seem to be ever referenced. Should this be done in Set::Set()?

Questions about using CTF::Tensor::read_all()

If I write t.read_all(destination, true), for some CTF::Tensor t, are the contents stored into destination ordered according to the global index formula, that is, by a column-major enumeration?
Can I pass nullptr as the destination argument on all but one process, to limit copying?