juliaparallel / elemental.jl Goto Github PK

Julia interface to the Elemental linear algebra library.

License: Other

Julia 100.00%

elemental.jl's Introduction

Elemental.jl

A package for dense and sparse distributed linear algebra and optimization. The underlying functionality is provided by the C++ library Elemental written originally by Jack Poulson and now maintained by LLNL.

Installation

The package is installed with Pkg.add("Elemental"). For Julia versions 1.3 and later, Elemental uses the binaries provided by BinaryBuilder, which are linked against the MPI (mpich) provided through BinaryBuilder.

Examples

Each of these examples should be run in a separate Julia session.

SVD example

This example runs on a single processor, and initializes MPI under the hood. However, explicit use of MPI.jl is not required in this case, compared to the other examples below.

julia> using LinearAlgebra, Elemental

julia> A = Elemental.Matrix(Float64)
0x0 Elemental.Matrix{Float64}

julia> Elemental.gaussian!(A, 100, 80);

julia> U, s, V = svd(A);

julia> convert(Matrix{Float64}, s)[1:10]
10-element Array{Float64,1}:
 19.8989
 18.2702
 17.3665
 17.0475
 16.4513
 16.3197
 16.0989
 15.8353
 15.5947
 15.5079

SVD example using MPI to parallelize on 4 processors

In this example, @mpi_do has to be used to send the parallel instructions to all processors.

julia> using MPI, MPIClusterManagers, Distributed

julia> man = MPIManager(np = 4);

julia> addprocs(man);

julia> @everywhere using LinearAlgebra, Elemental

julia> @mpi_do man A = Elemental.DistMatrix(Float64);

julia> @mpi_do man Elemental.gaussian!(A, 1000, 800);

julia> @mpi_do man U, s, V = svd(A);

julia> @mpi_do man println(s[1])
    From worker 5:  59.639990420817696
    From worker 4:  59.639990420817696
    From worker 2:  59.639990420817696
    From worker 3:  59.639990420817696

SVD example with DistributedArrays on 4 processors

This example is slightly different from the ones above in that it only calculates the singular values. However, it uses the DistributedArrays.jl package, and has a single thread of control. Note, we do not need to use @mpi_do explicitly in this case.

julia> using MPI, MPIClusterManagers, Distributed

julia> man = MPIManager(np = 4);

julia> addprocs(man);

julia> using DistributedArrays, Elemental

julia> A = drandn(1000, 800);

julia> Elemental.svdvals(A)[1:5]
5-element SubArray{Float64,1,DistributedArrays.DArray{Float64,2,Array{Float64,2}},Tuple{UnitRange{Int64}},0}:
 59.4649
 59.1984
 59.0309
 58.7178
 58.389

Truncated SVD

The iterative SVD algorithm is implemented in pure Julia, but the factorized matrix as well as the Lanczos vectors are stored as distributed matrices in Elemental. Notice, that TSVD.jl doesn't depend on Elemental and is only using Elemental.jl through generic function calls.

julia> using MPI, MPIClusterManagers, Distributed

julia> man = MPIManager(np = 4);

julia> addprocs(man);

julia> @mpi_do man using Elemental, TSVD, Random

julia> @mpi_do man A = Elemental.DistMatrix(Float64);

julia> @mpi_do man Elemental.gaussian!(A, 5000, 2000);

julia> @mpi_do man Random.seed!(123) # to avoid different initial vectors on the workers

julia> @mpi_do man r = tsvd(A, 5);

julia> @mpi_do man println(r[2][1:5])
    From worker 3:  [1069.6059089732858,115.44260091060129,115.08319164529792,114.87007788947226,114.48092348847719]
    From worker 5:  [1069.6059089732858,115.44260091060129,115.08319164529792,114.87007788947226,114.48092348847719]
    From worker 2:  [1069.6059089732858,115.44260091060129,115.08319164529792,114.87007788947226,114.48092348847719]
    From worker 4:  [1069.6059089732858,115.44260091060129,115.08319164529792,114.87007788947226,114.48092348847719]

Linear Regression

@mpi_do man A = Elemental.DistMatrix(Float32)
@mpi_do man B = Elemental.DistMatrix(Float32)
@mpi_do man copyto!(A, Float32[2 1; 1 2])
@mpi_do man copyto!(B, Float32[4, 5])

Run distributed ridge regression ½|A*X-B|₂² + λ|X|₂²

@mpi_do man X = Elemental.ridge(A, B, 0f0)

Run distributed lasso regression ½|A*X-B|₂² + λ|X|₁ (only supported in recent versions of Elemental)

@mpi_do man X = Elemental.bpdn(A, B, 0.1f0)

Coverage

Right now, the best way to see if a specific function is available, is to look through the source code. We are looking for help to prepare Documenter.jl based documentation for this package, and also to add more functionality from the Elemental library.

elemental.jl's People

Contributors

Stargazers

Watchers

Forkers

garrison tkelman bfly123 nolta gaybro8777 sinsixx asylumcorp juliatagbot astupidbear artaxerces wcwitt ranocha jwscook

elemental.jl's Issues

Examples of Communcations Patterns in Elemental

I am not sure how to convert a local Array into Elemental.DistMatrix. The example in the README (and that I have used in #72 ) use Elemental.gaussian to fill a DistMatrix. However for a realistic workflow, I need to load distributed pieces of data (each local to a rank) and "fill" the DistMatrix locally. I seem grid objects mentioned, so there must be some way to specify which indices map to which ranks (like https://github.com/eth-cscs/ImplicitGlobalGrid.jl), but I cannot find any documentations/examples

Clearly Elemental.gaussian! operates locally for each rank -- e.g. if I do:

rank = MPI.Comm_rank(comm)
A = Elemental.DistMatrix(Float64);
if rank == 0
Elemental.gaussian!(A, 4000, 3200);
end

the program hangs.

Can you point me to the right place?

How do I use svdvals in parallel ?

My goal is to add svd(A::DArray) to the julia interface to elemental. I have been looking at svdvals
for understanding how to do it. But I do not know how to use svdvals in parallel. The following program
has a sum of diffs of close to 0 when using a single process. When I run it with julia -p 2 or mpiexecjl -n 2 the sum of diffs is large.
How do I convert the program to run in parallel ?

using Elemental
using DistributedArrays
using LinearAlgebra

A  = drandn(50,50)
Al = Matrix(A)

a = svdvals(Al)
b = Elemental.svdvals(A)
println("sum of diffs= ",sum(a-b))

"ERROR: cannot serialize a pointer" with @async and @sync

With the latest changes I've just pushed to master I can copy a DArray into DistMatrix on the MPI workers and get the RemoteRefs back. E.g.

julia> using MPI

julia> man = MPIManager(np = 2);

julia> addprocs(man);

julia> using DistributedArrays

julia> using Elemental

julia> A = drandn(100,100);

julia> tmp = Elemental.toback(A)
1x2 Array{Any,2}:
 RemoteRef{Channel{Any}}(2,1,16)  RemoteRef{Channel{Any}}(3,1,17)

and then I can compute the singular values with

julia> for r in tmp
       remotecall(() -> println(svdvals(fetch(r))), r.where)
       end

julia>  From worker 3:  [19.928126878869584
    From worker 2:  [19.928126878869584
    From worker 3:   19.248234662565334
    From worker 2:   19.248234662565334
    From worker 2:   18.378572289263257
    From worker 3:   18.378572289263257
    From worker 2:   18.271178929809935

which is fine and I will start wrapping this array of RemoteRefs in some type representing the remote DistMatrix. However, I'd like to use the @sync for ... @async remotecall_wait as well, but if I try

julia> @time @sync for r in tmp
       @async remotecall_fetch(() -> svdvals(fetch(r)), r.where)
       end
    From worker 3:  fatal error on 3: ERROR: cannot serialize a pointer
    From worker 2:  fatal error on 2: ERROR: cannot serialize a pointer
    From worker 3:   in serialize at serialize.jl:418 (repeats 2 times)
    From worker 2:   in serialize at serialize.jl:418 (repeats 2 times)

cc: @jakebolewski

readme suggestions

Simple SVD example without MPI
---> is a more accurate to say
Simple SVD example without explicit use of MPI.jl (MPI on 1 processor is under the hood)

Understanding Elemental's Performance

Hi,

I am trying to understand the performance of this program at NERSC-- it is basically the same as the example in the README.md, except that I addprocs currently doesn't work, so I am using this (manual) approach of running the MPIClusterManager using start_main_loop, and stop_main_loop

N = parse(Int64, ARGS[1])

# to import MPIManager
using MPIClusterManagers

# need to also import Distributed to use addprocs()
using Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

@mpi_do manager begin
    using MPI
    comm = MPI.COMM_WORLD
    println(
            "Hello world,"
            * " I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))"
            * " on node $(gethostname())"
           )

    println("[rank $(MPI.Comm_rank(comm))]: Importing Elemental")
    using LinearAlgebra, Elemental
    println("[rank $(MPI.Comm_rank(comm))]: Done importing Elemental")

    println("[rank $(MPI.Comm_rank(comm))]: Solving SVD for $(N)x$(N)")
end

@mpi_do manager A = Elemental.DistMatrix(Float64);
@mpi_do manager Elemental.gaussian!(A, N, N);
@mpi_do manager @time U, s, V = svd(A); 
@mpi_do manager println(s[1])

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
@mpi_do manager begin
    println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
    Elemental.Finalize()
    println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end
# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)

I ran some strong scaling tests on 4 Intel Haswell nodes (https://docs.nersc.gov/systems/cori/#haswell-compute-nodes) using a 4000x4000, 8000x8000, and 16000x16000 random matrix.

I am measuring only the svd(A) time. I am attaching my measured times, and wanted to check if this is what you would expect. I am not an expert in how Elemental computes SVDs in a distributed fashion, and so would would be grateful for any advise you have for optimizing this benchmark's performance. In particular, I am interested in understanding what the optimal number of ranks are as a function of problem size (I am hoping that this is such an obvious questions, that you can point me to some existing documentation).

Cheers!

No longer builds on Julia master after libgit2 change

julia> Pkg.build("Elemental")
INFO: Building Elemental
========================================================[ ERROR: Elemental ]========================================================

LoadError: UndefVarError: Git not defined
while loading /Users/jiahao/.julia/v0.5/Elemental/deps/build.jl, in expression starting on line 1

====================================================================================================================================

README example gives BoundsError

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+1276 (2015-11-14 12:53 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 9c9a180 (1 day old master)
|__/                   |  x86_64-apple-darwin15.0.0

julia> using Elemental

julia> A = Elemental.Matrix(Float64);

julia> U,s,V = Elemental.svd(A)
(0x0 Elemental.Matrix{Float64},0x1 Elemental.Matrix{Float64},0x0 Elemental.Matrix{Float64})

julia> convert(Matrix{Float64}, s)[1:10]
ERROR: BoundsError: attempt to access 0x1 Array{Float64,2}
  at index [1:10]
 in throw_boundserror at abstractarray.jl:147
 [inlined code] from abstractarray.jl:131
 in getindex at array.jl:279
 in eval at ./boot.jl:264

norm(matrix) -> opnorm(matrix) in Julia 0.7

It looks like may be using norm(matrix). In Julia 0.7, this will compute the Frobenius norm (vecnorm in Julia 0.6), due to JuliaLang/julia#27401. If you want the induced/operator norm as in Julia 0.6, use opnorm(matrix) instead, or Compat.opnorm(matrix) to work in 0.6 and 0.7 (JuliaLang/Compat.jl#577).

Note that, for testing purposes, rather than @test norm(A - B) ≤ tol, it is usually preferred to do @test A ≈ B or @test A ≈ B rtol=... (which uses isapprox).

Symbol not found: _dlacpy_64_ when calling svdvals on a DistribuedArray

Hi,

I tried the example here: https://github.com/JuliaParallel/Elemental.jl#svd-example-with-distributedarrays-on-4-processors on my macOS 11.6.1 laptop and I get:

dyld: lazy symbol binding failed: Symbol not found: _dlacpy_64_

I'm wondering if this is a macOS problem (do I need to set a path maybe?), and if anyone here has any ideas.

SVD takes too much memory (and time)

I am doing an svd of a 10^4 x 10^4 matrix using Elemental. That's about 1GB of RAM, and the computer has 64G. With LAPACK (Julia svd), it takes about 3 mins using 16 cores. With Elemental, it ran over 15 mins and started heavily swapping to disk and I had to kill it.

julia> BLAS.set_num_threads(1)

julia> using MPI, MPIClusterManagers, Distributed

julia> MPIManager(np=16)
MPI.MPIManager(np=16,launched=false,mode=MPI_ON_WORKERS)

julia> man = ans
MPI.MPIManager(np=16,launched=false,mode=MPI_ON_WORKERS)

julia> addprocs(man);

julia> @everywhere using LinearAlgebra, Elemental

julia> @mpi_do man A = Elemental.DistMatrix(Float64);

julia> @mpi_do man Elemental.gaussian!(A, 10000, 10000);

julia> @time @mpi_do man U, s, V = svd(A);
^C^C^C^C^C^C^C^C^C^C^C^CERROR: InterruptException:

Print distributed matrices

We'll have to figure out how to print a DistMatrix without getting nworkers() times the matrix in the output. Branching such that only rank zero is printed hangs.

julia> using MPI

julia> man = MPIManager(np = 4);

julia> addprocs(man);

julia> using Elemental

julia> @mpi_do man A = Elemental.DistMatrix(Float64)

julia> @mpi_do man Elemental.gaussian!(A, 10, 10)

julia> @mpi_do man if MPI.Comm_rank(MPI.COMM_WORLD) == 0; println(A); end # hangs

Pkg.build in OS X

Because gcc of OSX doesn't support openmp, I failed to install this package.
I have tried installing with "brew install gcc and made an alias gcc='gcc-6'".
But package installer still doesn't work.

How to go from files to distributed matrix.

I have files each holding 1 column of an array. I would like to create an Elemental.DistMatrix from these files.
I would like to load the DistMatrix in parallel.
An earlier question was answered by pointing to Elemental/test/lav.jl
I made the following program by extracting from lav.jl. It works for a single node and hangs for 2 nodes using mpiexecjl.
I am using Julia 1.5 on a 4 core machine running Centos 7.5
Please let me know what is wrong with the program and how to load my column array files in parallel.
I intend to eventually run a program using DistMatrix on a computer with hundreds of cores.

# to import MPIManager
using MPIClusterManagers, Distributed

# Manage MPIManager manually -- all MPI ranks do the same work
# Start MPIManager
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)

# Init an Elemental.DistMatrix
@everywhere function spread(n0, n1)
println("start spread")
height = n0*n1
width = n0*n1
h= El.Dist(n0)
w= El.Dist(n1)
A = El.DistMatrix(Float64)
El.gaussian!(A, n0, n1) # how to init size ?
localHeight = El.localHeight(A)
println("localHeight ", localHeight)
El.reserve(A, 6*localHeight) # number of queue entries
println("after reserve")
for sLoc in 1:localHeight
s = El.globalRow(A, sLoc)
x0 = ((s-1) % n0) + 1
x1 = div((s-1), n0) + 1
El.queueUpdate(A, s, s, 11.0)
println("sLoc $sLoc, x0 $x0")
if x0 > 1
El.queueUpdate(A, s, s - 1, -10.0)
println("after q")
end
if x0 < n0
El.queueUpdate(A, s, s + 1, 20.0)
end
if x1 > 1
El.queueUpdate(A, s, s - n0, -30.0)
end
if x1 < n1
El.queueUpdate(A, s, s + n0, 40.0)
end
# The dense last column
# El.queueUpdate(A, s, width, floor(-10/height))
end # for
println("before processQueues")
El.processQueues(A)
println("after processQueues") # with 2 nodes never gets here
return A
end

@mpi_do manager begin
using MPI, LinearAlgebra, Elemental
const El = Elemental
res = spread(4,4)
println( "res=" , res)

# Manage MPIManager manually:
# Elemental needs to be finalized before shutting down MPIManager
# println("[rank $(MPI.Comm_rank(comm))]: Finalizing Elemental")
Elemental.Finalize()
# println("[rank $(MPI.Comm_rank(comm))]: Done finalizing Elemental")
end # mpi_do

# Shut down MPIManager
MPIClusterManagers.stop_main_loop(manager)

Thank you

Can not copy DistMatrix

# mpiexecjl -n 2 julia dcopy.jl

# This code trys to copy in parallel a El.DistMatrix to another El.DistMatrix
# The intention is for each rank to deal with just its portion of the data.
# My actual goal is to create a El.DistMatrix with 1 less column

# Please show me how do a copy with loops not just an assignment.
# The code works when a single rank is used.

using MPIClusterManagers
manager = MPIClusterManagers.start_main_loop(MPI_TRANSPORT_ALL)
@mpi_do manager begin
        using Elemental
        const El = Elemental
end

@mpi_do manager begin
   using MPI
   comm = MPI.COMM_WORLD
   myrank = MPI.Comm_rank(comm)
   num_ranks = MPI.Comm_size(comm)
   MPI.Barrier(comm)
end

@mpi_do manager begin
    height = 4
    width = 6
    dfrom = El.DistMatrix(Float32, El.STAR, El.VC)
    El.zeros!(dfrom, height, width)
    mul = 1.0/num_ranks
    
    # dfrom is a DistMatrix where each value is its coordinate in the array
    for i = 1:width
        for j = 1:height
            El.queueUpdate(dfrom, j, i, Float32(mul * (i*1000+j)))
        end
    end
    El.processQueues(dfrom)
    MPI.Barrier(comm)
    
    dto = El.DistMatrix(Float32, El.STAR, El.VC)
    
    # dto is the copy
    El.zeros!(dto, height, width) 
    lh = El.localHeight(dfrom)
    if lh != height
        println("$myrank lh != height  ")
        return
    end    
    lw = El.localWidth(dto)
    # All members of a column are in a single rank.
    for i = 1:lw 
        colto = El.globalCol(dto, i)
        #println("$myrank $colto  ")
        for j = 1:height
            #println("$myrank j= $j, colto= $colto\n")
            El.queueUpdate(dto, j, colto, dfrom[j, colto])
        end # j
    end # i

    El.processQueues(dto)
    MPI.Barrier(comm)
    # print dto
    d = collect(dto)
    dfromc = collect(dfrom)
        #println("$myrank dfrom= $dfrom")
        #println("$myrank dto= $dto")
    if myrank == 0

        println("------\n  ")
        println("* Lines with expected value.")
        for i = 1:height
            s = " "
            sf = "*"
            for j = 1:width
                sf = sf * "$(dfromc[i, j]),  "
                s = s * "$(d[i, j]),  "
            end
            println(sf, " ")
            println(s,"  ")
        end
    end
end

@mpi_do manager Elemental.Finalize()
 
MPIClusterManagers.stop_main_loop(manager)
println("main loop stopped")

Provide BB binaries

Now that we are close to having MPI.jl use BB MPI binaries, it would be nice to have Elemental through BB too.

README TSVD example does not work

Irrespective of whether I use the released version of TSVD or the master, everything simply hangs in the tsvd call.

Broken link in README

The Jack Paulson link in the README is broken. This seems to be a current page for him, or perhaps it could link to the paper?

Link to Elemental library points to nirvana

Hello,

the two links directly visible at the top of Readme.md (and some other places I suspect) to
https://libelemental.org/ and http://web.stanford.edu/~poulson/ appear to be invalid ?
The https://libelemental.org/ link gives an empty/placeholder wordpress page and http://web.stanford.edu/~poulson/ is Stanfords "Object Not Found" page at least at the time I followed them.

Best Regards

Support for Elemental built against system MPI

Is there any plan to support MVAPICH2 & intel MPI?

Intel MPI (which has binary compatibility with MVAPICH2) is also one of widely used MPI implementation.
It would be nice to Elemental to support MVAPICH2!

Linear Regression example in README doesn't work

All the other examples work fine. The linear regression doesn't work.

julia> @mpi_do man copy!(A, Float32[2 1; 1 2])
ERROR: TaskFailedException:
On worker 4:
ArgumentError: arrays must have the same axes for copy! (consider using `copyto!`)
copy! at ./abstractarray.jl:719
top-level scope at none:1
eval at ./boot.jl:331 [inlined]
#7 at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:510
#103 at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:290
run_work_thunk at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
run_work_thunk at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:88
#96 at ./task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Future) at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:386
 [3] remotecall_fetch(::Function, ::Int64, ::Future; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [4] remotecall_fetch at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421 [inlined]
 [5] macro expansion at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:493 [inlined]
 [6] (::MPIClusterManagers.var"#25#28"{Future})() at ./task.jl:356

...and 3 more exception(s).

Stacktrace:
 [1] sync_end(::Channel{Any}) at ./task.jl:314
 [2] macro expansion at ./task.jl:333 [inlined]
 [3] mpi_do(::MPIManager, ::Function) at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:490
 [4] top-level scope at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:511

julia> @mpi_do man copy!(B, Float32[4, 5])
ERROR: TaskFailedException:
On worker 4:
ArgumentError: arrays must have the same axes for copy! (consider using `copyto!`)
copy! at ./abstractarray.jl:719
top-level scope at none:1
eval at ./boot.jl:331 [inlined]
#9 at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:510
#103 at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:290
run_work_thunk at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
run_work_thunk at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:88
#96 at ./task.jl:356
Stacktrace:
 [1] #remotecall_fetch#143 at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:394 [inlined]
 [2] remotecall_fetch(::Function, ::Distributed.Worker, ::Future) at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:386
 [3] remotecall_fetch(::Function, ::Int64, ::Future; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421
 [4] remotecall_fetch at /Users/viral/julia/usr/share/julia/stdlib/v1.5/Distributed/src/remotecall.jl:421 [inlined]
 [5] macro expansion at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:493 [inlined]
 [6] (::MPIClusterManagers.var"#25#28"{Future})() at ./task.jl:356

...and 3 more exception(s).

Stacktrace:
 [1] sync_end(::Channel{Any}) at ./task.jl:314
 [2] macro expansion at ./task.jl:333 [inlined]
 [3] mpi_do(::MPIManager, ::Function) at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:490
 [4] top-level scope at /Users/viral/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:511

README simple example with MPI doesn't work


julia> using MPI

julia> man = MPIManager(np = 4);
ERROR: UndefVarError: MPIManager not defined
Stacktrace:
 [1] top-level scope at REPL[2]:1

Building Elemental on mac

Is building on a mac supported? The Elemental build failed with:

In file included from /Users/viral/.julia/v0.5/Elemental/deps/src/Elemental/src/blas_like/level1-C.cpp:9:
In file included from /Users/viral/.julia/v0.5/Elemental/deps/src/Elemental/include/El.hpp:15:
In file included from /Users/viral/.julia/v0.5/Elemental/deps/src/Elemental/include/El/core.hpp:182:
/Users/viral/.julia/v0.5/Elemental/deps/src/Elemental/include/El/core/Element/impl.hpp:562:43: error: 
      no matching function for call to 'round'
inline T Round( const T& alpha ) { return round(alpha); }
                                          ^~~~~

Segfault on PkgEval

Daily PkgEval testing of Elemental.jl has been segfaulting for a while, e.g., https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2022-11/24/Elemental.primary.log:

[ Info: Running Elemental.jl tests
worldSize=1

[130] signal (11.1): Segmentation fault
in expression starting at /home/pkgeval/.julia/packages/Elemental/SF7xW/test/lav.jl:78
_ZNK2El4Grid6VCSizeEv at /home/pkgeval/.julia/artifacts/8149f7b9fafd13aa486f176db1e7ecc5a987fb55/lib/libEl.so (unknown line)
unknown function (ip: 0x7efea5068cef)
Allocations: 631526 (Pool: 630967; Big: 559); GC: 1

This doesn't seem like a Julia issue, but an issue with this package or libEL. As it's been at the top of PkgEval reports, could you take a look? If necessary, you can easily reproduce the PkgEval sandbox environment (on Linux):

pkg> add https://github.com/JuliaCI/PkgEval.jl

julia> using PkgEval

julia> config = Configuration(julia="nightly")
PkgEval configuration 'unnamed' (
  - julia: nightly
  - ...
)

julia> PkgEval.sandboxed_julia(config)

# this spawns a sandbox where you can install and test packages

(@v1.10) pkg> add Elemental

(@v1.10) pkg> test Elemental

[196] signal (11.1): Segmentation fault

Documentation for the package

Filing a help needed issue to put together Documenter based documentation for this package.

Single node out of core/on disk matrices?

Can this library be used for linear algebra on single node, out-of-core datastructures like Dato's Sarray or a database if we can represent a database as an abstract matrix? Maybe if there is some way to chunk up the datastructure and stream it into memory?

Current Ubuntu GCC 4.9 seems to be buggy

The current Travis output shows that there is an internal error in g++-4.9 when compiling Elemental: https://travis-ci.org/JuliaParallel/Elemental.jl/builds/59600456

Perhaps the Travis build should use g++-4.8 (or at least not put up a "build failing" logo for g++-4.9 errors)?

README single processor svd example does not work

I'm using Julia 1.4 and Elemental.jl 0.5 (which downloaded and built Elemental through deps/build.jl)

julia> using LinearAlgebra, Elemental

julia> A = Elemental.Matrix(Float64)
0×0 Elemental.Matrix{Float64}

julia> Elemental.gaussian!(A, 100, 80);

julia> U, s, V = svd(A);

julia> convert(Matrix{Float64}, s)[1:10]
ERROR: MethodError: no method matching Array{Float64,2}(::Int64, ::Int64)
Closest candidates are:
  Array{Float64,2}(::UndefInitializer, ::Int64, ::Int64) where T at boot.jl:407
  Array{Float64,2}(::UndefInitializer, ::Int64...) where {T, N} at boot.jl:411
  Array{Float64,2}(::UndefInitializer, ::Integer, ::Integer) where T at baseext.jl:13
  ...
Stacktrace:
 [1] convert(::Type{Array{Float64,2}}, ::Elemental.Matrix{Float64}) at /home/viralbshah/.julia/packages/Elemental/ovOAZ/src/julia/generic.jl:114
 [2] top-level scope at REPL[19]:1

README examples do not work one after the other.

on the readme
the first example and the second example do not compose
a new user would naturally try both and get frustrated

perhaps the example that nominally states without MPI has a uniprocessor
MPI under the hood causing interference?

but still this is frustrating as a user experience

julia> using LinearAlgebra, Elemental

julia> A = Elemental.Matrix(Float64)
0×0 Elemental.Matrix{Float64}

julia> using MPI, MPIClusterManagers, Distributed

julia> man = MPIManager(np = 4);
ERROR: AssertionError: MPI.Initialized() == mgr.launched
Stacktrace:
 [1] MPIManager(; np::Int64, launch_timeout::Float64, mode::TransportMode, master_tcp_interface::String) at /Users/edelman/.julia/packages/MPIClusterManagers/0ZYYQ/src/mpimanager.jl:64
 [2] top-level scope at REPL[4]:1