Code Monkey home page Code Monkey logo

Comments (5)

solomonik avatar solomonik commented on August 15, 2024

Hi 0tt3r,

Indeed the operations you are performing have poor absolute performance in CTF. When an index appears in the operands as well as the output, as here, a naive unoptimized (actually its generality makes it slower than just a simple unoptimized) sequential code is used inside CTF. This is a known limitation that we hope to improve upon in the near future.

Another problem is that CTF is not currently good at factorizing expressions like ABC*D. Most codes that achieve high performance using CTF currently are written in the style

C += A*B
C *= D

ideally (dominated by) contractions that can be mapped to matrix multiplication after transposition. When the operations are pure tensor contractions (each index appears in exactly two tensors), much higher sequential performance is achieved due to leverage of BLAS.

Modifying your code in the below fashion gives 4x better performance on my machine

L["ik"] *= I["ik"];
O["ikl"] *= L["ik"];
A["ijkl"] = O["ikl"]*M["lj"];

Of course, writing code in this way is cumbersome and generally requires defining auxiliary intermediates.

The only positive news I have is that CTF is at least expected to scale well for such code on a distributed memory system. On my desktop, execution time goes from 12.8 to 3.2 sec going from 1 process to 4. After that there is much less return for increasing the number of processes (despite the presence of 12 cores), likely due to memory bandwidth saturation.

I'll leave this issue open as it highlights two of the main performance deficiencies of CTF currently.

from ctf.

0tt3r avatar 0tt3r commented on August 15, 2024

Hi solmonik,

Thank you for the quick replay! Rewriting the kernel like that greatly improves performance. That is great to hear. I was also seeing poor parallel performance with the original code, and now I know why.

The other thing I'd like to do is leverage the sparsity. L is ~1% sparse, but O, I, and M are all dense. In some slide you presented, you showed decent speedup at this level of sparsity. I know there is an open issue using sparsity, and I think that my case falls into the affected area. I'd like to write:

L.sparsify(0.99)
L["ik"] *= I["ik"];
O["ikl"] *= L["ik"];
O.sparsify()
A["ijkl"] = O["ikl"]*M["lj"];

Is this possible using custom functions, as you mention elsewhere? Would I see any speedup if I did that?

from ctf.

solomonik avatar solomonik commented on August 15, 2024

Unfortunately, this wont work until issue #23 is resolved.

from ctf.

0tt3r avatar 0tt3r commented on August 15, 2024

That's what I expected. Thank you for your time!

from ctf.

solomonik avatar solomonik commented on August 15, 2024

Cyclops v1.5.0 will have integration with MKL batched GEMM as well as better naive kernels for plain Hadamard products. The contractions you mentioned should actually work just fine. I think previously I thought one of A, O, or M is sparse, so the last contraction is with sparse tensors. Hoping to add naive support for that soon also just by converting to a higher-order sparse tensor, e.g. a pointwise product of a sparse vector and a dense vector can be written as a product of a sparse matrix and a dense vector.

from ctf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.