Code Monkey home page Code Monkey logo

Comments (7)

tlienart avatar tlienart commented on June 16, 2024

In the initialisation, the loop is:

for (fidx, factor) in enumerate(sim.fg.factors)
        vars   = assocvariables(sim.fg, fidx)
        xf, vf = sim.x0[vars], sim.v0[vars]
        g      = factor.gll(vcat(xf...))
        pq     = ls_updatepq!(pq, sim.fg, fidx, xf, vf, g, 0.0)
    end

A very similar loop happens in the body of the main function. All the operations are independent. Essentially:

a. look at current state or nodes around factors
b. compute a bouncing time for that factor
c. store the bouncing time in PQ

(a&b) should be done in //. The returned collection of times can then be fed in the PQ.

action

  • write a function that will ultimately replace ls_updatepq! by a similar function that just returns the bouncing time
  • collects bouncing time and store them in PQ.

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024

Since Julia takes every processor to potentially be on an independent machine, one also needs to send all of the relevant data to it so that it can do its computation and then send it back. So we have to modify the function a bit more and send all the relevant stuff with it. This will probably make the code somewhat uglier.

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024

So apart from wasting a lot of time on resources with poor documentation I didn't get very far.

Inherently there's a requirement of moving stuff around to the other physical cores. It's unclear to me whether the speedup of doing things "in parallel" is not completely overwhelmed by having to copy data structures around.

I've tried using a distributedarray with a @parallel for but didn't get anywhere and did not find good examples of similar use online. I'm putting this on the side for now waiting to get feedback from Leonard who has some experience with that kind of stuff.

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024

Actually there seems to be capacity for multithreading with shared memory using the @threads macro and related Threads etc. Links to look up:

I need to figure out a simple example with a clear speedup first then test it in our environment.

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024

In bash

export JULIA_NUM_THREADS=4

as per https://docs.julialang.org/en/latest/manual/parallel-computing#Multi-Threading-(Experimental)-1

then in julia

f(x) = sin(x^2)
function trial(N::Int64)
   r = rand(N)
   x = similar(r)
   
   println("basic")
   @time for k in eachindex(r) 
       x[k] = f(r[k])
   end

#   @time x = f.(r)  # broadcasting
#   @time x .= f.(r)  # in-place broadcasting
   
   println("threaded")
   y = similar(r)  # have to make a new array here for some reason
   @time Threads.@threads for k in eachindex(r) 
       y[k] = f(r[k])
   end
   
   return x == y
end

trial(1000)
trial(10^8)
trial(10^8)

Observed result

julia> trial(1000)
basic
  0.000025 seconds
threaded
  0.024379 seconds (6.79 k allocations: 293.707 KB)
true

julia> trial(10^8)
basic
  1.843685 seconds
threaded
  0.788258 seconds (2 allocations: 64 bytes)
true

julia> trial(10^8)
basic
  1.794225 seconds
threaded
  0.760491 seconds (2 allocations: 64 bytes)
true

So a non-negligible 2+ times speedup...

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024

So I've implemented and tested this in the "first branch" section

function ls_firstbranch!(fg::FactorGraph, fidx::Int, all_evlist::AllEventList,
                         pq::PriorityQueue, t::Float
                         )::Tuple{AllEventList,PriorityQueue}
    # retrieve xf, vf corresponding to factor
    (xf, vf, g, vars) = ls_retrieve(fg, fidx, all_evlist, t, true)
    ls_saveupdate!(all_evlist, vars, xf, vf, t)
    ls_updatepq!(pq, fg, fidx, xf, vf, g, t)
    # same story for linked factors (fp)
    Threads.@threads for fpidx in linkedfactors(fg, fidx)
        # we don't need to retrieve `vars` here
        (xfp, vfp, gp) = ls_retrieve(fg, fpidx, all_evlist, t)
        ls_updatepq!(pq, fg, fpidx, xfp, vfp, gp, t)
    end
    (all_evlist, pq)
end

This "works". It remains to be tested extensively to see whether it brings a speedup.

from pdsampler.jl.

tlienart avatar tlienart commented on June 16, 2024
  • randn locks threading, can circumvent with seeded randn
  • allocations locks threading, harder to circumvent especially when calling a stack of functions

-> making multithreading truly work here would require significant thinking, dropping the issue for now.

from pdsampler.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.