Comments (27)
Thanks for asking about this. The slow performance is not too surprising, what CTF will do when the contraction is specified this way is to first compute a partial sum of the tensor over the j index, which yields a sparse output matrix, and is slow due to the associated sparse data management. Instead, I would recommend to use the CTF MTTKRP kernel by multiplying over the j index with a vector full of 1s. The general sparse contraction/summation kernels are orders of magnitude slower than the multi-tensor MTTKRP kernel (see https://arxiv.org/pdf/2106.07135.pdf for details on this). There are also other multi-tensor contraction kernels present in CTF, and a general one is currently in development. These currently need to be invoked manually, when contractions are specified with Einsum syntax, pairwise contractions will be performed and performance can suffer e.g., if some intermediate computed quantities are sparse.
from ctf.
Thanks edgar! Can you point me to the docs or the header files that have these kernels declared? I'm not sure how to go about finding them (a search on the doc-site came up blank).
from ctf.
Sorry, yeah documentation of this may be incomplete. There is some albeit python-oriented introduction to this in the tutorial,
https://github.com/cyclops-community/ctf/wiki/Tensor-introduction#multi-tensor-contraction
The currently available multi-tensor routines are mainly just MTTKRP and TTTP (generalization of SDDMM). I think @raghavendrak may also have some implementation of TTMc for third order tensors on another fork. This is the C++ interface file for them
https://github.com/cyclops-community/ctf/blob/master/src/interface/multilinear.h
from ctf.
I'm sorry Edgar, can you give me a little more guidance about how to use the MTTKRP kernel in multilinear.h to compute the kernel I'm looking at (a["i"] = B["ijk"] * c["k"]
)? I can't understand from the comments how this works (it seems like if B is 3-dimensional, then the other operands need to be two dimensional).
from ctf.
MTTKRP performs A["ir"] = B["ijk"]*C["kr"]*D["kr"]
so if A, C, D are n-by-1 and D["kr"]=1
(that should also be interpretable as a comamnd by CTF in C++) you would recover the operation you are interested in as a special case. I think the mat_list interface will accept vectors too and automatically do this (if not, you can call the reshape function).
from ctf.
Thanks for your help Edgar! I think that I've implemented this as you imagine:
int x = 12092;
int y = 9184;
int z = 28818;
int lens[] = {x, y, z};
Tensor<double> B(3, true /* is_sparse */, lens, dw);
Vector<double> a(x, dw);
Vector<double> c(z, dw);
Vector<double> d(z, dw);
c.fill_random(1.0, 1.0);
d.fill_random(1.0, 1.0);
a.fill_random(0.0, 0.0);
Tensor<double>* mats[] = {&a, &c, &d};
MTTKRP(&B, mats, 0, false);
I was unsure about the last argument, or where to put the output tensor, but it seemed to work out?
Using this specialized kernel removed a good portion of the overhead, bringing it from 2 seconds to 90ms (vs 10ms hand coded).
from ctf.
The output tensor is the 0th argument since you passed 0 to the kernel, so that looks fine (I assume B is filled with nonzeros elsewhere). We recorded 2-4X overhead relative to SPLATT MTTKRP, which uses CSF. Current implementation uses a less compact default sparse tensor format than CSF (we are working on integration of CSF into this and other kernels) and is minimally optimized.
from ctf.
One last question. If I understand correctly, the above approach isn't going to work out if I want to run a slightly different SpTTV kernel: a["ij"] = B["ijk"] * c["k"]
?
from ctf.
Yes, for that, just performing the CTF contraction as usual would be advisable. Choosing A to be sparse or dense will lead to different performance, depending on sparsity of B. The multi-tensor kernels we've introduced so far are driven by common applications in tensor decomposition methods.
from ctf.
Hi again edgar, I'm trying to use the MTTKRP function to actually compute an MTTKRP. Here's my code, which should be evaluating the contraction A[i, l] = B[i, j, k] * C[j, l] * D[k, l]
.
void mttkrp(int nIter, int warmup, std::string filename, std::vector<int> dims, World& dw, int ldim) {
Tensor<double> B(3, true /* is_sparse */, dims.data(), dw);
Matrix<double> A(dims[0], ldim, dw);
Matrix<double> C(dims[1], ldim, dw);
Matrix<double> D(dims[2], ldim, dw);
C.fill_random(1.0, 1.0);
D.fill_random(1.0, 1.0);
B.read_sparse_from_file(filename.c_str());
if (dw.rank == 0) {
std::cout << "Read " << B.nnz_tot << " non-zero entries from the file." << std::endl;
}
Tensor<double>* mats[] = {&A, &C, &D};
auto avgMs = benchmarkWithWarmup(warmup, nIter, [&]() {
MTTKRP(&B, mats, 0, false);
});
if (dw.rank == 0) {
std::cout << "Average execution time: " << avgMs << " ms." << std::endl;
}
}
This code segfaults with the following backtrace:
(gdb) bt
#0 0x00000000100302a4 in CTF_int::handler() ()
#1 0x00000000100190b4 in CTF::Semiring<double, true>::MTTKRP(int, long*, int*, long, long, int, bool, CTF::Pair<double> const*, double const* const*, double*) ()
#2 0x0000000010019a1c in void CTF::MTTKRP<double>(CTF::Tensor<double>*, CTF::Tensor<double>**, int, bool) ()
#3 0x000000001000ebf8 in std::_Function_handler<void (), mttkrp(int, int, std::string, std::vector<int, std::allocator<int> >, CTF::World&, int)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#4 0x000000001000e740 in mttkrp(int, int, std::string, std::vector<int, std::allocator<int> >, CTF::World&, int) ()
#5 0x0000000010008a98 in main ()
Is there anything here that looks incorrect to you? This code was adapted from the working version you helped me with above.
from ctf.
Somewhat related, I was looking at the TTTP implementation, trying to implement SDDMM. Based on the comment at least, I'm not sure how to use the handwritten function to implement a contraction A[i, k] = B[i, k] * C[i, j] * D[j, k]
. I dont understand how to map the indices in the comment For ndim=3 and mat_list=[X,Y,Z], this operation is equivalent to einsum("ijk,ia,ja,ka->ijk",A,X,Y,Z)
to something like this.
from ctf.
I think the segfault is because you are using false
for aux_mode_first
(currently, there is no implementation available for aux_mode_first = false
).
Sample code for MTTKRP to achieve ijk,ja,ka->ia
:
Tensor<dtype> UT(2, false, lens, dw);
UT["ji"] = U["ij"];
Tensor<dtype> VT(2, false, lens, dw);
VT["ji"] = V["ij"];
Tensor<dtype> UC(2, false, lens, dw);
Tensor<dtype> * mlist[3] = {&UC, &UT, &VT};
MTTKRP<dtype>(&T, mlist, 0, true);
from ctf.
In my comment above (#134 (comment)), I used aux_mode_first = false, and that seemed to work in a different situation.
That seems to be the einsum notation expression I want, but I'm unsure why you are transposing the matrices.
from ctf.
In the comment you mentioned, the factors are vectors. Since there is only one index, aux_mode_first
false or true will be the same.
I think you are seeing the segfault from this:
Line 1259 in c4f89dc
aux_mode_first = true
expects the auxiliary index (in the contraction) to show up first.from ctf.
Thanks @raghavendrak! Following up, do you have an explanation why you transposed the matrices in your example?
Also, wondering if you have an answer to my question about TTTP/SDDMM.
from ctf.
@rohany
I think Raghavendra had the transposes to compute the MTTKRP in the form you are looking for, i.e., sum_{ij} T_{ijk}U_{ia}V_{ja} since our routine evidently can currently only do sum_{ij} T_{ijk}U_{ai}V_{aj} (U and V) transposed. For SDDMM, you should use TTTP with an order 2 sparse tensor, performing the einsum("ij,ia,ja->ij",A,X,Y) to do the SDDMM Hadamard_product(A,X^TY). Let me know if all this makes sense.
from ctf.
Thanks, the MTTKRP transpose portion makes sense to me.
I'm still confused about SDDMM -- the einsum you wrote there is not isomorphic to what I want to compute: A[i, k] = B[i, k] * C[i, j] * D[j, k]
.
from ctf.
@rohany I think it is, you have "ik,ij,jk->ik", which is my einsum modulo transposition and substitutions j->k, a->j.
from ctf.
If I'm reading this correctly, you wrote einsum("ij,ia,ja->ij",A,X,Y)
, which translated to my notation is A[i, j] = A[i, j] * X[i, a] * Y[i, a]
, while my desired computation is A[i, k] = B[i, k] * C[i, j] * D[j, k]
. This is not isomorphic, as the operation X[i, a] * Y[i, a]
does not form a matrix multiplication between X and Y.
from ctf.
einsum("ij,ia,ja->ij",A,X,Y)
is A[i, j] = A[i, j] * X[i, a] * Y[j, a]
not A[i, j] = A[i, j] * X[i, a] * Y[i, a]
from ctf.
Yes, sorry. A[i, j] = A[i, j] * X[i, a] * Y[j, a]
is still not correct right, as X[i, a] * Y[j, a]
is not a matrix multiplication -- it should be X[i, a] * Y[a, j]
right?
from ctf.
I think X[i, a] * Y[j, a]
is a matrix multiplication (at least modulo transposition, as I mentioned in my post), i.e. it is XY^T
from ctf.
The CTF interface assumes all matrices (X,Y) are oriented in the same way, which makes much more sense in generalizing to tensors.
from ctf.
That make sense to me now. I will transpose my Y matrix to fit the interface.
from ctf.
Thanks for all the help! Code seems to be up and running now, so I'll close the issue.
from ctf.
Hi @solomonik, I was looking at this again:
The general sparse contraction/summation kernels are orders of magnitude slower than the multi-tensor MTTKRP kernel (see https://arxiv.org/pdf/2106.07135.pdf for details on this).
I was scanning the paper you linked, but I'm not finding discussion of distributed sparse MTTKRP in that paper -- is it the right link?
from ctf.
Its not, sorry about that, here is the right link https://arxiv.org/pdf/1910.02371.pdf
from ctf.
Related Issues (20)
- segmentation fault in simple code HOT 1
- unable to install in windows subsystem linux HOT 5
- Problem with push_slice HOT 2
- Efficiency comparisons with einsum and opt-einsum and how to utilize symmetry? HOT 4
- ctf-einsum.py:78: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. HOT 1
- issues using sparse file io to load tensors HOT 4
- Compile error with GCC 11 HOT 10
- Wrong results when slicing a symmetric sparse tensor in python lib
- ModuleNotFoundError: No module named 'ctf.core' HOT 1
- Comparing Fortran and CTF performance on symmetries in tensor contractions HOT 2
- segfault executing sparse inner product HOT 18
- oom/memory corruption running an SDDMM (using TTTP specialized routine) HOT 1
- unexpected performance for SpMV operation HOT 1
- [question] setting all nonzeros to a value HOT 2
- Warnings
- make test failure in SVD test HOT 3
- compiling issue when including ctf.hpp
- test_suite failure on Apple HOT 1
- compile issues with undefined references to mkl commands (that appear in the relevant folders) HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctf.