juliahep / jetreconstruction.jl Goto Github PK

View Code? Open in Web Editor NEW

17.0 3.0 3.0 21.96 MB

Jet reconstruction (reclustering) with Julia

Home Page: https://juliahep.github.io/JetReconstruction.jl/

License: MIT License

Julia 100.00%

hep julia physics jet-reclustering jet-finding jet-reconstruction

jetreconstruction.jl's Issues

Defaut strategy

We should have a default strategy (like Best) where the number of input particles is used to pick the more optimal algorithm (FastJet does this). #26 would help us to empirically determine the breakpoints as a function of the density of initial particles.

Use GLMakie for on-screen example plots

I have found WGLMakie to be very flaky running in Jupyter and Pluto notebooks - cells will re-render graphics, but then get stalled for minutes while some weird stuff is happening under the hood. I even tried on a completely different machine with a fresh Julia install, but was able to see similar issues quite quickly.

GLMakie seems to be much more stable, so even if there is the inconvenience of a second window for the plot, it's much better for implementing the visualised examples.

Return type should be a ClusterSequence

For post-processing after the initial clustering, it is really necessary to have the history of how clusters were merged and access to all of the intermediate merged jets.

Therefore we need to expose the ClusterSequence in the package and make this the actual return type.

Functions such as inclusive_jets(::ClusterSequence) are then used to (in this case) return all of the final state jets.

Fixes #29

HepMC3 Reader Status

We have as ACSII HepMC3 reader that @grasph wrote. It's great for testing and benchmarking.

However, is there a better defined HepMC3 reader in Julia? I see that there is a HepMC3_jll, but I don't find any Julia interface package for it. Am I missing something? Does anyone know of plans to write an interface or support a native Julia reader?

Simpler examples

The current example is a mish-mash of code profiling and performance measurements. That should stay, but a much simpler example should be given.

A few notebook examples would also be very nice to have.

Move project to JuliaHEP organization?

Improve tests

There is a nice benchmark of 100 Pythia 8 events that @grasph generated that we use for correctness and benchmarking ATM.

I would propose that we extend this to two additional use cases:

low density FCCee type events (where N2Plain might outperform N2Tiled)
very high density HI type events (which will just be challenging)

In both cases FastJet should be used as a reference to ensure correctness.

We should also test other clustering strategies, at least Cambridge/Achen ($p=0$) and Inclusive $k_T$ ($p=1$).

Optimisation needed

The current Julia implementation is significantly slower than C++ FastJet and I assume we can do better. Here you can see a plot that compares benchmarking results from an older version of this repo, which is already heavily optimised, ("Faster Julia" from about a week ago, showed in dark purple) with the currently used version of the algorithm ("New Faster Julia", in magenta). C++ FastJet is still way below. I did not include compilation time and used BenchmarkTools.jl for benchmarking. For the Julia approaches I've plotted the entire range of runtimes as well as the median (it's very close to the lower bound).

I cannot think of any other methods to optimise the code for now. The latest updates have already been inspired mostly by @Moelf 's suggestions (i.e. using @inbounds @simd for, Base.@propagate_inbounds, etc). In case anyone is able to spot other issues or knows ways to optimise the code with more or less native Julia methods, please let me know.

P.S. Profiling shows that most of the time we spend in the two following blocks of code:

...
## src/Algo.jl @ line 105 
# initialize _nn
@simd for i in 1:N
    _upd_nn_crosscheck!(i, 1, i-1, _eta, _phi, _R2, _nndist, _nn)
end
...

...
## src/Algo.jl @ line 165 
# update nearest neighbours step
@inbounds @simd for k in 1:N
    _upd_nn_step!(i, j, k, N, Nn, _kt2, _eta, _phi, _R2, _nndist, _nn, _nndij)
end
...

Use Makie for 3D barplot

here's a dummy example:

using WGLMakie
begin
	N = 100
	δ = 0.02
	cs = repeat(1:5, N÷5)
	heights = rand(N) * 200
	meshscatter(
                                # this is needed because Makie would flatten to 2D if all Zs are the same
		Point3f.(rand(N), rand(N), 0.00001rand(N)); 
	  	color = cs, 
		markersize = Vec3f.(δ, δ, heights),
		colormap = :Spectral_7,
		marker = Rect3f(Vec3f(0), Vec3f(1)),
	 	axis = (type = Axis3, perspectiveness = 0.5, azimuth = 2.4,
            xlabel = "ϕ", ylabel = "η", zlabel = "kt",
		    limits = (nothing, nothing, nothing, nothing, 0, 200)
		),
	    shading=false
	)
end

It should also work with GLMakie.jl, but 3D might be a bit broken with CairoMakie...

Write proper documentation

The README.md file is ok to get people started, but we really need proper documentation done with Documeter.jl before we release this package.

Remove internal data from PseudoJet

The PseudoJet struct is extremely useful and works very well in this package. However, it has one defect when exposed publicly, viz. a data member which is completely for the internal use of the N2Tiled algorithm, _cluster_hist_index.

This should be removed to make the struct clean for export and an internal index added to be used during tiled reconstruction.

Invoke algorithm by enum, not by power value

Currently the main interface to the two reconstruction strategies takes the power for the $d_{ij}$ metric as an argument, which then switches between the anti-kt, CA and kt versions (-1, 0, 1).

However, this is only good for these $pp$ algorithms and doesn't account for, e.g., the $e^+e^-$ algorithms, such as the generalised $k_T$.

So it's better to switch to steering with the explicit algorithm name and not a power value as a proxy.

Use `FastJet.jl` for comparison later

it looks like eventually we'd compare to FastJet's output, @jstrube has kindly wrapped the C++ library in Julia: https://github.com/jstrube/FastJet.jl . I also assume Jan is happy to help you with any missing features there.

Due to lack of docs might want to look at C++ FastJet documentation and compare to test:
https://github.com/jstrube/FastJet.jl/blob/ba5a1521430fbfecf3cfec977d450ae169063ee5/test/runtests.jl#L23

but we can imagine extending FastJet.jl to accept stuff from LorentzVectorHep.jl

Return consistent sequence types

At the moment N2Plain and N2Tiled return different representations of the merging sequence (tiled returns a FastJet inspired one, plain returns a simpler nested structure).

We should really make these the same (honestly, I don't understand so well the use cases for this, beyond debugging).

Add kt algorithm for e+e−

For FCCee studies this is a required algorithm (see FastJet manual sec. 4.6)

More regular code formatting

The code formatting is currently scrappy.

Should adopt a proper style for using JuliaFormatter and also add a GitHub action to ensure code doesn't drift away from the project style.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Loop optimisation

I've been trying to optimise the code for sequential recombination algorithm in Julia and spotted an interesting fact.

Below you can see a very simple piece of code that is basically a translation of the C++ FastJet code. The profiler shows that lines that only have nndist = Δ2 in them take up as much time as really computationally heavy lines like Δ2 = _dist(i, j, _eta, _phi). I checked the stack that the profiler throws out and it's actually calling the == operation inside of the iterate() function that causes that. So this probably comes not from the assignment itself but from the end of the loop iteration. Simple loops typically get optimised but here it's something strange.

I used Profile.jl and profiled the entire algorithm in a loop. This function is called from there frequently and only exists to be compiled to avoid dynamic dispatch in the main algorithm.

# src/Algo.jl @ line 45
function _upd_nn_nocross!(i::Int, from::Int, to::Int, _eta, _phi, _R2, _nndist, _nn)
    nndist::Float64 = _R2
    nn = i
    Δ2::Float64 = 0.0
    for j in from:(i-1)
        Δ2 = _dist(i, j, _eta, _phi)
        if Δ2 <= nndist
            nn = j
            nndist = Δ2 # too long for an assignment
        end
    end
    for j in (i+1):to
        Δ2 = _dist(i, j, _eta, _phi)
        if Δ2 <= nndist
            nn = j
            nndist = Δ2 # too long for an assignment
        end
    end
    _nndist[i] = nndist
    _nn[i] = nn
    nothing
end

Remove tiling parameter from current ClusterSequence

The ::ClusterSequence.tiling member is really an internal detail of the tiled algorithm, so it should be factored out and used as separate parameter in the tiled algorithm

Define public interface

We should define what the public interfaces are to the package. At the moment we have:

Input

N2Plain

Defaults to a collection of arbitrary objects, where px, py, pz, energy are defined for each object (in JetReconstruction, i.e., JetReconstruction.px(T)) to extract the relevant 4-vector parameters.

I like that, but the default used in Particle.jl is a little bit hokey for my taste as the index numbers are mapped to each quantity in a magic number kind of way.

N2Tiled

A collection of PseudoJet structs.

The thing I dislike currently is that the struct has an implementation specific parameter, viz. _cluster_hist_index. It's also rather close to LorentzVectorHEP, albeit that circular parameters are cached (not sure if we need that).

Proposal

Standardise to accept input particles as a collection of LorentzVectors, used via the LorentzVectorHEP package, which provides the appropriate px, py, pz, energy definitions already. This also achieves #5.

I still like the idea of documenting for type T the idea of defining JetReconstruction.px{T}, etc. that provides a clear way for users to use their own types with the package.

Internal Use

It is probably optimal that each algorithm uses an internal EDM when users pass arbitrary types, as these can be tuned for internal efficiency (e.g., the N2Tiled method can keep using its current PseudoJet).

Output

N2Plain returns the same type of jet objects that it was given as input. N2Tiled returns its own PseudoJet objects.

Also the sequences are different, with N2Tiled returning a FastJet-like clusterisation history and N2Plain a vector of ints, giving the merging index sequence for each input particle (and intermediates).

N2Tiled also filters on $p_{Tmin}$ and N2Plain doesn't.

I am really not sure what's best here - uniformly return something like LorentzVectorCyl, which gives $(p_T, \eta, \phi)$ coordinates?

Nicer strategy switching

ATM, the algorithm strategy is manual - --strategy=N2Plain in chep.jl. The string is mapped to an enum in Julia, but there is an annoying manual mapping from the option string to the enum. I could not find a way to extract a nice string from the enum names, did I miss something?

However, reading about this issue, it seems people often use types in these cases, not enums. Should see if that is a nicer way. If a concrete type was used to map to each algorithm it could contain a user friendly string for the CLI.

@grasph - would you be willing to relicense the code under an MIT license?

use existing Lorentz Vector package

https://github.com/JuliaHEP/LorentzVectorHEP.jl instead of https://github.com/gojakuch/JetReconstruction.jl/blob/main/src/Particle.jl ?