Code Monkey home page Code Monkey logo

Comments (13)

kannon92 avatar kannon92 commented on August 15, 2024 1

So I think this is my problem.

I am able to get past the write call without any problems now. I messed up the allocation on different processors. I'm still working on getting my code working, but it is now just giving me a wrong answer, so that should be an easier fix. Thank you for the clarification on how to use Cyclops.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

So with a gdb back trace, there is a segmentation fault when posix_memalign is called in memcontrol.cxx:328.

I tried to find an answer to this online, but I am allocating a chunk of memory that is larger than the max value of int and the return value for posix_memalign is 32767. The length of the array that I am allocating is 760,000, but I only passed a size of 47500 to the write.

from ctf.

solomonik avatar solomonik commented on August 15, 2024

Hi Kevin,

The write() call is the right function for the purposes you describe. If posix_memalign() or the assert on the subsequent line fail, my first guess would be you are running out of memory. If you write 47500 doubles, pairs of (int64_t key, double val) will be allocated with a total size of 760000 bytes. Does the code work with a smaller problem size? You mentioned the sequential version works, does that also use the write() function?

The write(), read(), read_local() and the contract functionality in CTF may need to use extra buffers (up to 2-3x the memory of the tensor), so running right below full memory utilization may be problematic (if you have a cluster, the solution is to just use more nodes). You can get rid of one of the buffer allocations, the one that is crashing for you, by using the write() function which takes Pair<> objects (array of structs rather than struct of arrays), in which case no conversion occurs, as this representation is the one used internally. But it may still happen that CTF runs into memory overflow elsewhere, if a factor of two in buffer space is not available.

If the problem is not memory, I can try to reproduce it if you post/send the code. Seems like constructing a small example of this problem should also be viable.

By the way, it is possible to ask CTF to distribute the tensor in a predefined manner, see https://github.com/solomonik/ctf/blob/master/src/interface/tensor.h#L285. Its necessary to define a Partition object which corresponds to the processor grid, and assign indices to define the mapping from tensor indices to processor grid dimensions. Then you can access the local buffer directly as raw_data, and it will match the distribution you request. However, the distribution will be cyclic and will be padded, so if you are just blocking along the Q index, it will be a different ordering. If you always map the Q index in the same way the blocked to cyclic permutation should not matter, but dealing with this may be complex. So, I would not recommend using the default mapping unless read()/write() are observed to be a significant performance bottleneck.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

I have not got any test to work in parallel.

I am able to run a sequential code with using write. This is not a large problem so I should not even be close to running out of memory. I am running on a cluster with 128 GB of memory and this test case is a double tensor of (115 by 25 by 25), so only about 0.65 MB.

I will try and make a mock code that can reproduce this problem. I will also try using the Pairs allocation. I will post the program here in a few hours.

from ctf.

solomonik avatar solomonik commented on August 15, 2024

I see, I can also recommend trying valgrind, maybe there is a memory corruption earlier somewhere. I also find valgrind to be much easier to use in combination with MPI.

The write() function with separate index and value arrays should work correctly, I doubt switching to Pairs will make a difference.

from ctf.

solomonik avatar solomonik commented on August 15, 2024

If its easy, feel free to post the relevant code snippet rather than extracting a full working sample, that might be sufficient.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

Thank you for the help. Everything works great now.

I have an of-topic question:

How does the performance of your tensor library do when there is a large amount of sparsity? I notice that your paper tested it for 16% and lower. I am interested in using this library for implementing something like what is done in this paper: "Journal of Chemical Theory and Computation 2016 12 (7), 3122-3134".

I have the sparse code working and I was going to try different ways of obtaining sparsity. Have you noticed any performance hits if there is a large number of zeros compared to non-zeros?

from ctf.

solomonik avatar solomonik commented on August 15, 2024

You can find some newer results for sparse MP3 in my ISTCP presentation (slides 14, 15) http://solomon2.web.engr.illinois.edu/talks/istcp_jul22_2016.pdf

There we test 10%, 1%, and .1% sparsity. There is a diminishing return from sparsity, but it depends on the particular problem. Above 10% sparsity can actually be slower than the dense case though. In part, this is just due to how fast the MKL routines we are using are (make sure you configure with Intel if you want decent sparse performance by the way).

Also, I should warn that currently CTF can't handle Hadamard/weigh indices with sparse tensors (dense works), sparse contractions have to be `proper' (reducible to matmul). That limitation is described in this github issue, https://github.com/solomonik/ctf/issues/23. Its already the most high priority thing to do in CTF, but let me know if its also necessary for what you are doing (it may be necessary for sparse HF), and I will really try to hurry up on implementing that functionality.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

So I'm kinda confused about the Hadamard indices,

What kind of tensor contractions work for sparsity?

And, My group (Evangelista lab) are all interested in this library and I know that we have Hadamard indices present in our code already. We don't have to use the sparsity feature but it will be of use to us at some point in the near future.

from ctf.

solomonik avatar solomonik commented on August 15, 2024

Hi Kevin,

Contractions like C["...i..."]=A["...i..."]*B["...i..."] currently don't work of A, B, or C are sparse. If they all are dense it works. If any index appears in only two tensors, all combinations of sparsity are supported.

Summations allow one to do B["ij"]=A["ij"] with sparse B, A though. Its also possible to use custom types to implement C["ij"]=A["ij"]*B["ij"], by forming Tensor P and effectively computing P["ij"]=(A["ij"],B["ij"]) (see Function<> syntax) then C from P.

But if you need something like c["ikl"]=A["ijk"]*B["ijl"], there is not good solution at the moment for sparse A, B. But again, this is high priority and I am hoping to add the functionality in the next couple of months. Let me know if you need it urgently.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

So I have a working parallel sparse fock builder. However, I do find some performance problems when calling read and write. I am trying to figure out how to change the distribution of my tensor.

I am confused at how to get the Partition object to achieve this.

Let's say I have a 3-dimension tensor (edge length of m, n, n). I want to distribute along the first dimension.
I tried doing Partition part(3, {m, n, n}).
Now, I have to get an Idx_Partition.
What do I do from here? If I want to distribute along m only, do I just fill the idx member of Idx_Partition to be the blocking I want (ie P0 gets 0th row, P1 gets 1st row, P2 gets 2nd row)? What should I do for the other dimensions?

Sorry for these simple questions. I have a hard time in understanding how the Partition object is used for creating distributions.

from ctf.

solomonik avatar solomonik commented on August 15, 2024

One working example to define a matrix in a specific distribution (you can add to/from it with one in a different distribution to move data quicker than read/write):
int plens[] = {pr, pc};
Partition ip(2, plens);
Matrix M(nrow, ncol, "ij", ip["ij"], Idx_Partition(), 0, *this->wrld, *this->sr);

So, you'll want pr=p, pc=1. Then you can access the data via the raw_data() pointer, and it will be distributed cyclically over the mode for which you specified a mapping over p processors. Cyclic over p processors means processor j owns elements {j, j+p, j+2p, ... n-p+j}.

Hope that clarifies things, let me know if you have more questions. Sorry there is no examples for this at the moment, it is relatively new functionality. It has also not been tested that thoroughly, so please let me know if you see issues.

But also, depending on what you need to frequently access the tensor for, it would be even more efficient to do it by applying some Function<>(), i.e. encode it in a sum/contraction with a special elementwise operator. If this seems viable, but unclear, give us more details as to what you need to do.

from ctf.

kannon92 avatar kannon92 commented on August 15, 2024

Thank you for your quick response. It is no problem that there aren't any examples. I will play around with this tomorrow and I will let you know if I find any problems.

from ctf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.