Comments (5)
Hi 0tt3r,
Indeed the operations you are performing have poor absolute performance in CTF. When an index appears in the operands as well as the output, as here, a naive unoptimized (actually its generality makes it slower than just a simple unoptimized) sequential code is used inside CTF. This is a known limitation that we hope to improve upon in the near future.
Another problem is that CTF is not currently good at factorizing expressions like ABC*D. Most codes that achieve high performance using CTF currently are written in the style
C += A*B
C *= D
ideally (dominated by) contractions that can be mapped to matrix multiplication after transposition. When the operations are pure tensor contractions (each index appears in exactly two tensors), much higher sequential performance is achieved due to leverage of BLAS.
Modifying your code in the below fashion gives 4x better performance on my machine
L["ik"] *= I["ik"];
O["ikl"] *= L["ik"];
A["ijkl"] = O["ikl"]*M["lj"];
Of course, writing code in this way is cumbersome and generally requires defining auxiliary intermediates.
The only positive news I have is that CTF is at least expected to scale well for such code on a distributed memory system. On my desktop, execution time goes from 12.8 to 3.2 sec going from 1 process to 4. After that there is much less return for increasing the number of processes (despite the presence of 12 cores), likely due to memory bandwidth saturation.
I'll leave this issue open as it highlights two of the main performance deficiencies of CTF currently.
from ctf.
Hi solmonik,
Thank you for the quick replay! Rewriting the kernel like that greatly improves performance. That is great to hear. I was also seeing poor parallel performance with the original code, and now I know why.
The other thing I'd like to do is leverage the sparsity. L is ~1% sparse, but O, I, and M are all dense. In some slide you presented, you showed decent speedup at this level of sparsity. I know there is an open issue using sparsity, and I think that my case falls into the affected area. I'd like to write:
L.sparsify(0.99)
L["ik"] *= I["ik"];
O["ikl"] *= L["ik"];
O.sparsify()
A["ijkl"] = O["ikl"]*M["lj"];
Is this possible using custom functions, as you mention elsewhere? Would I see any speedup if I did that?
from ctf.
Unfortunately, this wont work until issue #23 is resolved.
from ctf.
That's what I expected. Thank you for your time!
from ctf.
Cyclops v1.5.0 will have integration with MKL batched GEMM as well as better naive kernels for plain Hadamard products. The contractions you mentioned should actually work just fine. I think previously I thought one of A, O, or M is sparse, so the last contraction is with sparse tensors. Hoping to add naive support for that soon also just by converting to a higher-order sparse tensor, e.g. a pointwise product of a sparse vector and a dense vector can be written as a product of a sparse matrix and a dense vector.
from ctf.
Related Issues (20)
- Problem with push_slice HOT 2
- Efficiency comparisons with einsum and opt-einsum and how to utilize symmetry? HOT 4
- ctf-einsum.py:78: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. HOT 1
- issues using sparse file io to load tensors HOT 4
- Compile error with GCC 11 HOT 10
- Wrong results when slicing a symmetric sparse tensor in python lib
- understanding performance overheads in CTF HOT 27
- ModuleNotFoundError: No module named 'ctf.core' HOT 1
- Comparing Fortran and CTF performance on symmetries in tensor contractions HOT 2
- segfault executing sparse inner product HOT 18
- oom/memory corruption running an SDDMM (using TTTP specialized routine) HOT 1
- unexpected performance for SpMV operation HOT 1
- [question] setting all nonzeros to a value HOT 2
- Warnings
- make test failure in SVD test HOT 3
- compiling issue when including ctf.hpp
- test_suite failure on Apple HOT 1
- compile issues with undefined references to mkl commands (that appear in the relevant folders) HOT 9
- Scale with Endomorphism
- Set values with low memory footprint
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctf.