Implement a first straight line based hough transform, e.g. from sci-kit learn: <p

Task 1 The idea is to start with single-particl

Efficiency rate per track we call <code class="notran

Updates tasks: Plot has per matching probabilit

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

Task 7 In order to complete this task I parsed

First (straight line) hough transform about sms2021-tra-tra HOT 16 OPEN

asalzburger commented on September 4, 2024

First (straight line) hough transform

from sms2021-tra-tra.

Comments (16)

AndrewSpano commented on September 4, 2024 1

Indeed, I had to check the initial*.csv file for the tangent at the origin:

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Task 1

The idea is to start with single-particle events and then move to many-particle events. Let's take a look at a single particle event:

Now we have to "take" those points to the Hough space (aka parameter space), by using the following transformation:
y = mx + b <=> b = -xm + y. This basically converts each point to a line, like so:

We can then try to find the intersection points of all those lines, and the "most common" one will coefficients where the points align best in the original space. This yields:

Now it's time to move to many-particle events. Let's take a look at their tracks:

Their corresponding lines in the Hough space will look like this:

It's a bit messy. Maybe finding exact intersections will not yield accurate results. Let's give it a shot:

Indeed the results are not very good. The rounding error introduced in order to count close lines as similar, is way too big. This can be solved by introducing the method of binning. We will discretize the 2D space into small bins, which all accumulate scores (votes) for every intersection that lies inside them. Each bin will represent a pair of (m, b) parameters. After the "voting procedure", we will select the bins with the top 24 values. This will yield:

which looks like it fits the data way better than the previous method. By looking at it with the eye, it looks like only 3 tracks were not found.
Task 2

Firstly, we have to define when two tracks are approximately equal. Two tracks will be equal when their lines are close to each other. Of course, a small threshold will be allowed. The function that does this is this one:
- For the efficiency rate, we define it as the percentage of true (ground truth) tracks that match with at least any (non-previously-matched) estimated track. With this, the efficiency rate we get is 0.75:
- For the fake rate, we define it as the percentage of estimated tracks that are nowhere close to any true track. With this, we get a fake rate of 0:
- For the duplicate rate, we define it as the percentage of estimated tracks that are similar to each other (i.e. really close, practically representing a the same true track). With this, we get a duplicate rate of 0.21:
The Hough transform is a good method overall. A way to possibly make it more efficient is to store the points of intersection for every bin and later if a bin is selected, then the parameters "m" and "b" get estimated as the median of all the intersections of that bin. This will increase the accuracy up to 0.001, which could be a slight (yet noticeable) improvement.
Task 3

Took a look at the Helix equation. Made some notes. Have some questions to ask.

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Tasks 4 & 5

Tasks 4 and 5 are linked (actually 4 can't be done without doing 5 first). I uploaded the code in src/utils/metrics.py (in the new pull request). I don't think it makes much sense to post code here, so I will post the results:

Efficiency Rate. Implemented exactly as described above. Computed it separately for every estimated track:
Fake Rate. An estimated track is defined as fake if the number of hits of the leading particle is less than a fraction of the total number of hits (book-kept hits) for that estimated track. Eg: If a track has 10 hits but the leading particle has only 2 hits (could be 5 different particles where each had 2 hits), then the track is considered fake. This was not the case for any estimated track:
Duplicate Rate. Two tracks are defined duplicates if they have the same leading particle (correct me if I'm wrong!). The result I got for the estimated tracks are:

ToDo: As mentioned yesterday, the number of different (truth) particles in an event is unknown. Therefore, it is wrong to sample the "top 25" tracks, since in real-life scenarios we will not have this information. In order to fix this, I thought of picking all the tracks where their corresponding bins have a minimum number of book-kept hits inside them. This will probably give us a higher number of tracks than needed, but at least we won't undershoot. Maybe discuss this in the next meeting.

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

Efficiency rate per track we call matching probability, then we use efficiency only for the ensemble of tracks.

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Updates tasks:

Plot has per matching probability has been done (also the fixes regarding the selection of the leading particle were implemented):
Efficiency per eta-pT values almost done, need to fix some bugs, likely will upload the code tomorrow
Selection function: still working on it

from sms2021-tra-tra.

noemina commented on September 4, 2024

That's good!! Can you try to put the matching probability on the x axis?
Something like this for example ;)

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

Exactly, there we can se how we can make a cut.

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

This is purely in the (x/y)-plane, you ignore that the hit or the track have a longitudinal (z) component.

That's justified for this example, because the magnetic field is constant in z-axis.
-> the helix is a circle in the transverse

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Task 7

In order to complete this task I parsed all the dataset (.csv) files and grouped together (in new dataframes) the objects that have similar values (for eta and p_T). Then I run the whole pipeline for each range of values in order to get the efficiency. I got the following results:

The algorothm seems to be performing ok for the different eta values. For the transverse momentum, it's not performing well for lower values (could be a bug in my code?). Maybe the tracks are so similar that the bin size of (0.0001, 0.0001) can't distinguish them. I will have to look into this further.
Task 6

Fixed the plot, placed the matching probability in the x-axis:
Tasks 8 & 9 (selection/classification & helix Hough transform)

Ongoing, I have some ideas for the selection phase which I will try to implement tomorrow.

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Task 8

Selection: Most methods have been implemented, I have a few questions though for the holes.
Task 9

Ongoing.

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

Regarding the helical (circular when projected in 2d space) Hough transform:

I started by tackling the one-particle files. Let's take for example this one:

Since in real-life scenarios we will not know the momentum of a particle for any given hit, I tried to solve this fitting problem without using any extra information. The main idea I came up with (after consulting with Noemi) is to utilize the fact that the circular track must always go through the origin (0, 0). Since finding a circle requires findings 3 variables (a radius and a center), this information above reduces the problem to finding 2 variables that are linearly connected. This means that we can apply the methods from the previous task. For a more detailed explanation of the math:

The lines in the Hough Space look like this:

For the selection of the candidate bins, I selected those that had at least 10-11 hits, since previous analysis showed that there are on average 14 hits per particle. The result I got is:

I tried to estimate the emittance angle from the circle as the angle of the tangent-line-at-the-origin with x-axis:

The ground truth value I got (which is computed using: phi = arctan2(py, px)) is slightly different from the estimated one:

I might have to look for bugs in the code.

Now regarding the many-particle files, I picked this one at random:

The lines in the Hough Space look like this:

Running the algorithm will yields the following results:

By assessing the estimated helical tracks we get the following results:

which look kind of promising.

There are some notes I have to make:

I want to make the matching-probability and efficiency-per-eta-pT-values plots. I wanted to use PyRoot for the plots, but I can't get it to work. It says that it is not compatible with the Python version I am using. I tried python 2.7, 3.4, 3.5, 3.6, 3.9 and none of the worked. I will try again tomorrow to fix it.
I found a possible bug in my efficiency function. When computing the efficiency, I didn't consider the case where 2 estimated tracks might have the same leading particle. In this case, both tracks will be considered as legit when this shouldn't be the case because they are technically duplicates. I was thinking of fixing it, just wanted to consult with you before I pushed the code with the changes in these functions.

from sms2021-tra-tra.

AndrewSpano commented on September 4, 2024

After trying to install ROOT for over 3 hours (I even tried compiling the source code, run into many errors, checked the forum, couldn't solve some of them as there weren't relevant posts), I concluded that there is no point in installing it in this laptop, as I recently bought a new one (which I am currently setting up). I will try to install it in the new laptop once I start using it.
I plotted (for one estimated track) the lines in the Hough Space and the estimated intersection point from the binning:

The same results can be seen for any other estimated track, if chosen.
I refactored the metrics (now they are almost as modular as it can get).
I couldn't find the reason why the estimated emittance angle is incorrect. One thing that I noticed is that for single-particle files, since we know that all the hits belong to the same particle (with the same curvature and momentum), I believe (?) it should be expected that all of them have the same emittance angle. This is not the case:

Is this behavior expected?
I did a bit of research on other algorithms for line/curve fitting, and I found out about the RANSAC algorithm.

This is another algorithm that will work out nice for ideal data, but probably not for real-world data. The idea behind it is detect outliers by iteratively sampling random data points and fitting a model's parameter to them. Then we can use that model to find the number of outliers. The parameters that have the least outliers (i.e. the most inliers) will be stored and preferred in the future.

I implemented it for the x-y plane (circle fitting) and the r-z plane (line fitting). The results are quite good for such a simple algorithm. For the circle (x-y plane) fitting, it managed to find all of the tracks in just 19 seconds:

For the line (r-z plane) fitting, it managed to find 21/25 tracks in just 16 seconds:

The notebook can be found in src/notebooks/issue3/RANSAC.ipynb.
PS. I haven't uploaded the code yet because the previous pull request hasn't been merged and I think (?) that if I switch branches (in order to upload the new code) then I will lose all the changes. We can sort this out in the morning!.

from sms2021-tra-tra.

noemina commented on September 4, 2024

Effect of binning on the evaluation of the crossing point.

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

estimate the phi_0 & compare this with 'particles_final['particle_id'].momentum.phi`

from sms2021-tra-tra.

asalzburger commented on September 4, 2024

That's re-assuring

from sms2021-tra-tra.

First (straight line) hough transform about sms2021-tra-tra HOT 16 OPEN

Comments (16)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent