juliahep / jetreconstruction.jl Goto Github PK
View Code? Open in Web Editor NEWJet reconstruction (reclustering) with Julia
License: MIT License
Jet reconstruction (reclustering) with Julia
License: MIT License
We have as ACSII HepMC3 reader that @grasph wrote. It's great for testing and benchmarking.
However, is there a better defined HepMC3 reader in Julia? I see that there is a HepMC3_jll, but I don't find any Julia interface package for it. Am I missing something? Does anyone know of plans to write an interface or support a native Julia reader?
We should have a default strategy (like Best
) where the number of input particles is used to pick the more optimal algorithm (FastJet does this). #26 would help us to empirically determine the breakpoints as a function of the density of initial particles.
There is a nice benchmark of 100 Pythia 8 events that @grasph generated that we use for correctness and benchmarking ATM.
I would propose that we extend this to two additional use cases:
In both cases FastJet should be used as a reference to ensure correctness.
We should also test other clustering strategies, at least Cambridge/Achen (
At the moment N2Plain
and N2Tiled
return different representations of the merging sequence (tiled returns a FastJet inspired one, plain returns a simpler nested structure).
We should really make these the same (honestly, I don't understand so well the use cases for this, beyond debugging).
here's a dummy example:
using WGLMakie
begin
N = 100
δ = 0.02
cs = repeat(1:5, N÷5)
heights = rand(N) * 200
meshscatter(
# this is needed because Makie would flatten to 2D if all Zs are the same
Point3f.(rand(N), rand(N), 0.00001rand(N));
color = cs,
markersize = Vec3f.(δ, δ, heights),
colormap = :Spectral_7,
marker = Rect3f(Vec3f(0), Vec3f(1)),
axis = (type = Axis3, perspectiveness = 0.5, azimuth = 2.4,
xlabel = "ϕ", ylabel = "η", zlabel = "kt",
limits = (nothing, nothing, nothing, nothing, 0, 200)
),
shading=false
)
end
It should also work with GLMakie.jl, but 3D might be a bit broken with CairoMakie...
I've been trying to optimise the code for sequential recombination algorithm in Julia and spotted an interesting fact.
Below you can see a very simple piece of code that is basically a translation of the C++ FastJet code. The profiler shows that lines that only have nndist = Δ2
in them take up as much time as really computationally heavy lines like Δ2 = _dist(i, j, _eta, _phi)
. I checked the stack that the profiler throws out and it's actually calling the == operation inside of the iterate() function that causes that. So this probably comes not from the assignment itself but from the end of the loop iteration. Simple loops typically get optimised but here it's something strange.
I used Profile.jl and profiled the entire algorithm in a loop. This function is called from there frequently and only exists to be compiled to avoid dynamic dispatch in the main algorithm.
# src/Algo.jl @ line 45
function _upd_nn_nocross!(i::Int, from::Int, to::Int, _eta, _phi, _R2, _nndist, _nn)
nndist::Float64 = _R2
nn = i
Δ2::Float64 = 0.0
for j in from:(i-1)
Δ2 = _dist(i, j, _eta, _phi)
if Δ2 <= nndist
nn = j
nndist = Δ2 # too long for an assignment
end
end
for j in (i+1):to
Δ2 = _dist(i, j, _eta, _phi)
if Δ2 <= nndist
nn = j
nndist = Δ2 # too long for an assignment
end
end
_nndist[i] = nndist
_nn[i] = nn
nothing
end
The current Julia implementation is significantly slower than C++ FastJet and I assume we can do better. Here you can see a plot that compares benchmarking results from an older version of this repo, which is already heavily optimised, ("Faster Julia" from about a week ago, showed in dark purple) with the currently used version of the algorithm ("New Faster Julia", in magenta). C++ FastJet is still way below. I did not include compilation time and used BenchmarkTools.jl for benchmarking. For the Julia approaches I've plotted the entire range of runtimes as well as the median (it's very close to the lower bound).
I cannot think of any other methods to optimise the code for now. The latest updates have already been inspired mostly by @Moelf 's suggestions (i.e. using @inbounds @simd for
, Base.@propagate_inbounds
, etc). In case anyone is able to spot other issues or knows ways to optimise the code with more or less native Julia methods, please let me know.
P.S. Profiling shows that most of the time we spend in the two following blocks of code:
...
## src/Algo.jl @ line 105
# initialize _nn
@simd for i in 1:N
_upd_nn_crosscheck!(i, 1, i-1, _eta, _phi, _R2, _nndist, _nn)
end
...
...
## src/Algo.jl @ line 165
# update nearest neighbours step
@inbounds @simd for k in 1:N
_upd_nn_step!(i, j, k, N, Nn, _kt2, _eta, _phi, _R2, _nndist, _nn, _nndij)
end
...
We should define what the public interfaces are to the package. At the moment we have:
Defaults to a collection of arbitrary objects, where px
, py
, pz
, energy
are defined for each object (in JetReconstruction
, i.e., JetReconstruction.px(T)
) to extract the relevant 4-vector parameters.
I like that, but the default used in Particle.jl
is a little bit hokey for my taste as the index numbers are mapped to each quantity in a magic number kind of way.
A collection of PseudoJet
structs.
The thing I dislike currently is that the struct has an implementation specific parameter, viz. _cluster_hist_index
. It's also rather close to LorentzVectorHEP
, albeit that circular parameters are cached (not sure if we need that).
Standardise to accept input particles as a collection of LorentzVector
s, used via the LorentzVectorHEP
package, which provides the appropriate px
, py
, pz
, energy
definitions already. This also achieves #5.
T
the idea of defining JetReconstruction.px{T}
, etc. that provides a clear way for users to use their own types with the package.It is probably optimal that each algorithm uses an internal EDM when users pass arbitrary types, as these can be tuned for internal efficiency (e.g., the N2Tiled method can keep using its current PseudoJet
).
N2Plain returns the same type of jet objects that it was given as input. N2Tiled returns its own PseudoJet
objects.
Also the sequences are different, with N2Tiled returning a FastJet-like clusterisation history and N2Plain a vector of ints, giving the merging index sequence for each input particle (and intermediates).
N2Tiled also filters on
I am really not sure what's best here - uniformly return something like LorentzVectorCyl
, which gives
ATM, the algorithm strategy is manual - --strategy=N2Plain
in chep.jl
. The string is mapped to an enum
in Julia, but there is an annoying manual mapping from the option string to the enum. I could not find a way to extract a nice string from the enum names, did I miss something?
However, reading about this issue, it seems people often use types in these cases, not enums. Should see if that is a nicer way. If a concrete type was used to map to each algorithm it could contain a user friendly string for the CLI.
We have a bunch of code that came from AntiKt,jl that is licensed under GPL. The rest of the package is under the more permissive MIT license (more popular in the Julia community IMO).
Current CERN and HSF advice is to not use viral licenses and, specifically, ATLAS and CMS disfavour them (they use Apache2).
@grasph - would you be willing to relicense the code under an MIT license?
it looks like eventually we'd compare to FastJet's output, @jstrube has kindly wrapped the C++ library in Julia: https://github.com/jstrube/FastJet.jl . I also assume Jan is happy to help you with any missing features there.
Due to lack of docs might want to look at C++ FastJet documentation and compare to test:
https://github.com/jstrube/FastJet.jl/blob/ba5a1521430fbfecf3cfec977d450ae169063ee5/test/runtests.jl#L23
but we can imagine extending FastJet.jl to accept stuff from LorentzVectorHep.jl
The PseudoJet struct is extremely useful and works very well in this package. However, it has one defect when exposed publicly, viz. a data member which is completely for the internal use of the N2Tiled
algorithm, _cluster_hist_index
.
This should be removed to make the struct clean for export and an internal index added to be used during tiled reconstruction.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.