Comments (7)
In the initialisation, the loop is:
for (fidx, factor) in enumerate(sim.fg.factors)
vars = assocvariables(sim.fg, fidx)
xf, vf = sim.x0[vars], sim.v0[vars]
g = factor.gll(vcat(xf...))
pq = ls_updatepq!(pq, sim.fg, fidx, xf, vf, g, 0.0)
end
A very similar loop happens in the body of the main function. All the operations are independent. Essentially:
a. look at current state or nodes around factors
b. compute a bouncing time for that factor
c. store the bouncing time in PQ
(a&b) should be done in //. The returned collection of times can then be fed in the PQ.
action
- write a function that will ultimately replace
ls_updatepq!
by a similar function that just returns the bouncing time - collects bouncing time and store them in PQ.
from pdsampler.jl.
Since Julia takes every processor to potentially be on an independent machine, one also needs to send all of the relevant data to it so that it can do its computation and then send it back. So we have to modify the function a bit more and send all the relevant stuff with it. This will probably make the code somewhat uglier.
from pdsampler.jl.
So apart from wasting a lot of time on resources with poor documentation I didn't get very far.
Inherently there's a requirement of moving stuff around to the other physical cores. It's unclear to me whether the speedup of doing things "in parallel" is not completely overwhelmed by having to copy data structures around.
I've tried using a distributedarray with a @parallel for
but didn't get anywhere and did not find good examples of similar use online. I'm putting this on the side for now waiting to get feedback from Leonard who has some experience with that kind of stuff.
from pdsampler.jl.
Actually there seems to be capacity for multithreading with shared memory using the @threads
macro and related Threads
etc. Links to look up:
I need to figure out a simple example with a clear speedup first then test it in our environment.
from pdsampler.jl.
In bash
export JULIA_NUM_THREADS=4
as per https://docs.julialang.org/en/latest/manual/parallel-computing#Multi-Threading-(Experimental)-1
then in julia
f(x) = sin(x^2)
function trial(N::Int64)
r = rand(N)
x = similar(r)
println("basic")
@time for k in eachindex(r)
x[k] = f(r[k])
end
# @time x = f.(r) # broadcasting
# @time x .= f.(r) # in-place broadcasting
println("threaded")
y = similar(r) # have to make a new array here for some reason
@time Threads.@threads for k in eachindex(r)
y[k] = f(r[k])
end
return x == y
end
trial(1000)
trial(10^8)
trial(10^8)
Observed result
julia> trial(1000)
basic
0.000025 seconds
threaded
0.024379 seconds (6.79 k allocations: 293.707 KB)
true
julia> trial(10^8)
basic
1.843685 seconds
threaded
0.788258 seconds (2 allocations: 64 bytes)
true
julia> trial(10^8)
basic
1.794225 seconds
threaded
0.760491 seconds (2 allocations: 64 bytes)
true
So a non-negligible 2+ times speedup...
from pdsampler.jl.
So I've implemented and tested this in the "first branch" section
function ls_firstbranch!(fg::FactorGraph, fidx::Int, all_evlist::AllEventList,
pq::PriorityQueue, t::Float
)::Tuple{AllEventList,PriorityQueue}
# retrieve xf, vf corresponding to factor
(xf, vf, g, vars) = ls_retrieve(fg, fidx, all_evlist, t, true)
ls_saveupdate!(all_evlist, vars, xf, vf, t)
ls_updatepq!(pq, fg, fidx, xf, vf, g, t)
# same story for linked factors (fp)
Threads.@threads for fpidx in linkedfactors(fg, fidx)
# we don't need to retrieve `vars` here
(xfp, vfp, gp) = ls_retrieve(fg, fpidx, all_evlist, t)
ls_updatepq!(pq, fg, fpidx, xfp, vfp, gp, t)
end
(all_evlist, pq)
end
This "works". It remains to be tested extensively to see whether it brings a speedup.
from pdsampler.jl.
- randn locks threading, can circumvent with seeded randn
- allocations locks threading, harder to circumvent especially when calling a stack of functions
-> making multithreading truly work here would require significant thinking, dropping the issue for now.
from pdsampler.jl.
Related Issues (20)
- Extend path integral of polynomials for local BPS HOT 1
- Hide functions that do not strictly need to be exposed HOT 1
- Add switch for multithreading and test resulting performances HOT 2
- Testing script Azure HOT 4
- Use Compat library to help with forward looking language HOT 5
- Fix evaluation of gradient in Global/Local BPS HOT 1
- GBPS for local BPS
- Restricted GBPS
- Julia 0.5.0 line break models/pmf/#36 HOT 1
- add shortcut for spherical gaussians
- loading time HOT 3
- Naming? HOT 9
- registering package HOT 2
- Read and integrate R-code
- Remove Klara dependency HOT 1
- v06 compat HOT 1
- Julia 0.6.3 & 0.7-alpha compat HOT 4
- Move this repo to the `JuliaAI` organization? HOT 1
- The number of dimensions of the algorithm is independent of the amount or size of the data analyzed.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdsampler.jl.