juliahep / jetreconstruction.jl Goto Github PK
View Code? Open in Web Editor NEWJet reconstruction (reclustering) with Julia
Home Page: https://juliahep.github.io/JetReconstruction.jl/
License: MIT License
Jet reconstruction (reclustering) with Julia
Home Page: https://juliahep.github.io/JetReconstruction.jl/
License: MIT License
We should have a default strategy (like Best
) where the number of input particles is used to pick the more optimal algorithm (FastJet does this). #26 would help us to empirically determine the breakpoints as a function of the density of initial particles.
I have found WGLMakie to be very flaky running in Jupyter and Pluto notebooks - cells will re-render graphics, but then get stalled for minutes while some weird stuff is happening under the hood. I even tried on a completely different machine with a fresh Julia install, but was able to see similar issues quite quickly.
GLMakie seems to be much more stable, so even if there is the inconvenience of a second window for the plot, it's much better for implementing the visualised examples.
For post-processing after the initial clustering, it is really necessary to have the history of how clusters were merged and access to all of the intermediate merged jets.
Therefore we need to expose the ClusterSequence
in the package and make this the actual return type.
Functions such as inclusive_jets(::ClusterSequence)
are then used to (in this case) return all of the final state jets.
Fixes #29
We have as ACSII HepMC3 reader that @grasph wrote. It's great for testing and benchmarking.
However, is there a better defined HepMC3 reader in Julia? I see that there is a HepMC3_jll, but I don't find any Julia interface package for it. Am I missing something? Does anyone know of plans to write an interface or support a native Julia reader?
The current example is a mish-mash of code profiling and performance measurements. That should stay, but a much simpler example should be given.
A few notebook examples would also be very nice to have.
There is a nice benchmark of 100 Pythia 8 events that @grasph generated that we use for correctness and benchmarking ATM.
I would propose that we extend this to two additional use cases:
In both cases FastJet should be used as a reference to ensure correctness.
We should also test other clustering strategies, at least Cambridge/Achen (
The current Julia implementation is significantly slower than C++ FastJet and I assume we can do better. Here you can see a plot that compares benchmarking results from an older version of this repo, which is already heavily optimised, ("Faster Julia" from about a week ago, showed in dark purple) with the currently used version of the algorithm ("New Faster Julia", in magenta). C++ FastJet is still way below. I did not include compilation time and used BenchmarkTools.jl for benchmarking. For the Julia approaches I've plotted the entire range of runtimes as well as the median (it's very close to the lower bound).
I cannot think of any other methods to optimise the code for now. The latest updates have already been inspired mostly by @Moelf 's suggestions (i.e. using @inbounds @simd for
, Base.@propagate_inbounds
, etc). In case anyone is able to spot other issues or knows ways to optimise the code with more or less native Julia methods, please let me know.
P.S. Profiling shows that most of the time we spend in the two following blocks of code:
...
## src/Algo.jl @ line 105
# initialize _nn
@simd for i in 1:N
_upd_nn_crosscheck!(i, 1, i-1, _eta, _phi, _R2, _nndist, _nn)
end
...
...
## src/Algo.jl @ line 165
# update nearest neighbours step
@inbounds @simd for k in 1:N
_upd_nn_step!(i, j, k, N, Nn, _kt2, _eta, _phi, _R2, _nndist, _nn, _nndij)
end
...
here's a dummy example:
using WGLMakie
begin
N = 100
δ = 0.02
cs = repeat(1:5, N÷5)
heights = rand(N) * 200
meshscatter(
# this is needed because Makie would flatten to 2D if all Zs are the same
Point3f.(rand(N), rand(N), 0.00001rand(N));
color = cs,
markersize = Vec3f.(δ, δ, heights),
colormap = :Spectral_7,
marker = Rect3f(Vec3f(0), Vec3f(1)),
axis = (type = Axis3, perspectiveness = 0.5, azimuth = 2.4,
xlabel = "ϕ", ylabel = "η", zlabel = "kt",
limits = (nothing, nothing, nothing, nothing, 0, 200)
),
shading=false
)
end
It should also work with GLMakie.jl, but 3D might be a bit broken with CairoMakie...
The README.md
file is ok to get people started, but we really need proper documentation done with Documeter.jl before we release this package.
The PseudoJet struct is extremely useful and works very well in this package. However, it has one defect when exposed publicly, viz. a data member which is completely for the internal use of the N2Tiled
algorithm, _cluster_hist_index
.
This should be removed to make the struct clean for export and an internal index added to be used during tiled reconstruction.
Currently the main interface to the two reconstruction strategies takes the power for the
However, this is only good for these
So it's better to switch to steering with the explicit algorithm name and not a power value as a proxy.
it looks like eventually we'd compare to FastJet's output, @jstrube has kindly wrapped the C++ library in Julia: https://github.com/jstrube/FastJet.jl . I also assume Jan is happy to help you with any missing features there.
Due to lack of docs might want to look at C++ FastJet documentation and compare to test:
https://github.com/jstrube/FastJet.jl/blob/ba5a1521430fbfecf3cfec977d450ae169063ee5/test/runtests.jl#L23
but we can imagine extending FastJet.jl to accept stuff from LorentzVectorHep.jl
At the moment N2Plain
and N2Tiled
return different representations of the merging sequence (tiled returns a FastJet inspired one, plain returns a simpler nested structure).
We should really make these the same (honestly, I don't understand so well the use cases for this, beyond debugging).
For FCCee studies this is a required algorithm (see FastJet manual sec. 4.6)
The code formatting is currently scrappy.
Should adopt a proper style for using JuliaFormatter
and also add a GitHub action to ensure code doesn't drift away from the project style.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
I've been trying to optimise the code for sequential recombination algorithm in Julia and spotted an interesting fact.
Below you can see a very simple piece of code that is basically a translation of the C++ FastJet code. The profiler shows that lines that only have nndist = Δ2
in them take up as much time as really computationally heavy lines like Δ2 = _dist(i, j, _eta, _phi)
. I checked the stack that the profiler throws out and it's actually calling the == operation inside of the iterate() function that causes that. So this probably comes not from the assignment itself but from the end of the loop iteration. Simple loops typically get optimised but here it's something strange.
I used Profile.jl and profiled the entire algorithm in a loop. This function is called from there frequently and only exists to be compiled to avoid dynamic dispatch in the main algorithm.
# src/Algo.jl @ line 45
function _upd_nn_nocross!(i::Int, from::Int, to::Int, _eta, _phi, _R2, _nndist, _nn)
nndist::Float64 = _R2
nn = i
Δ2::Float64 = 0.0
for j in from:(i-1)
Δ2 = _dist(i, j, _eta, _phi)
if Δ2 <= nndist
nn = j
nndist = Δ2 # too long for an assignment
end
end
for j in (i+1):to
Δ2 = _dist(i, j, _eta, _phi)
if Δ2 <= nndist
nn = j
nndist = Δ2 # too long for an assignment
end
end
_nndist[i] = nndist
_nn[i] = nn
nothing
end
The ::ClusterSequence.tiling
member is really an internal detail of the tiled algorithm, so it should be factored out and used as separate parameter in the tiled algorithm
We should define what the public interfaces are to the package. At the moment we have:
Defaults to a collection of arbitrary objects, where px
, py
, pz
, energy
are defined for each object (in JetReconstruction
, i.e., JetReconstruction.px(T)
) to extract the relevant 4-vector parameters.
I like that, but the default used in Particle.jl
is a little bit hokey for my taste as the index numbers are mapped to each quantity in a magic number kind of way.
A collection of PseudoJet
structs.
The thing I dislike currently is that the struct has an implementation specific parameter, viz. _cluster_hist_index
. It's also rather close to LorentzVectorHEP
, albeit that circular parameters are cached (not sure if we need that).
Standardise to accept input particles as a collection of LorentzVector
s, used via the LorentzVectorHEP
package, which provides the appropriate px
, py
, pz
, energy
definitions already. This also achieves #5.
T
the idea of defining JetReconstruction.px{T}
, etc. that provides a clear way for users to use their own types with the package.It is probably optimal that each algorithm uses an internal EDM when users pass arbitrary types, as these can be tuned for internal efficiency (e.g., the N2Tiled method can keep using its current PseudoJet
).
N2Plain returns the same type of jet objects that it was given as input. N2Tiled returns its own PseudoJet
objects.
Also the sequences are different, with N2Tiled returning a FastJet-like clusterisation history and N2Plain a vector of ints, giving the merging index sequence for each input particle (and intermediates).
N2Tiled also filters on
I am really not sure what's best here - uniformly return something like LorentzVectorCyl
, which gives
ATM, the algorithm strategy is manual - --strategy=N2Plain
in chep.jl
. The string is mapped to an enum
in Julia, but there is an annoying manual mapping from the option string to the enum. I could not find a way to extract a nice string from the enum names, did I miss something?
However, reading about this issue, it seems people often use types in these cases, not enums. Should see if that is a nicer way. If a concrete type was used to map to each algorithm it could contain a user friendly string for the CLI.
Based on the new ClusterSequence
return type, develop plotting routines to visualise the output of the reconstruction
This is the basis of pruning and cleaning jets after the first reconstruction pass, so we need this
For FCCee studies this is a required algorithm (see FastJet manual sec. 4.5)
Is there way to somehow make the algorithms GPU-able?
We have a bunch of code that came from AntiKt,jl that is licensed under GPL. The rest of the package is under the more permissive MIT license (more popular in the Julia community IMO).
Current CERN and HSF advice is to not use viral licenses and, specifically, ATLAS and CMS disfavour them (they use Apache2).
@grasph - would you be willing to relicense the code under an MIT license?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.