Code Monkey home page Code Monkey logo

Comments (16)

AndrewSpano avatar AndrewSpano commented on September 4, 2024 1

Indeed, I had to check the initial*.csv file for the tangent at the origin:

finally_works

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024
  • Task 1

    The idea is to start with single-particle events and then move to many-particle events. Let's take a look at a single particle event:

    one_p_track

    Now we have to "take" those points to the Hough space (aka parameter space), by using the following transformation:
    y = mx + b <=> b = -xm + y. This basically converts each point to a line, like so:

    one_p_hough_space

    We can then try to find the intersection points of all those lines, and the "most common" one will coefficients where the points align best in the original space. This yields:

    one_p_prediction


    Now it's time to move to many-particle events. Let's take a look at their tracks:

    many_p_tracks

    Their corresponding lines in the Hough space will look like this:

    many_p_hough_space

    It's a bit messy. Maybe finding exact intersections will not yield accurate results. Let's give it a shot:

    many_p_intersections

    Indeed the results are not very good. The rounding error introduced in order to count close lines as similar, is way too big. This can be solved by introducing the method of binning. We will discretize the 2D space into small bins, which all accumulate scores (votes) for every intersection that lies inside them. Each bin will represent a pair of (m, b) parameters. After the "voting procedure", we will select the bins with the top 24 values. This will yield:

    many_p_bins

    which looks like it fits the data way better than the previous method. By looking at it with the eye, it looks like only 3 tracks were not found.

  • Task 2

    Firstly, we have to define when two tracks are approximately equal. Two tracks will be equal when their lines are close to each other. Of course, a small threshold will be allowed. The function that does this is this one:

    same_track_function

    • For the efficiency rate, we define it as the percentage of true (ground truth) tracks that match with at least any (non-previously-matched) estimated track. With this, the efficiency rate we get is 0.75:

      efficiency_rate

    • For the fake rate, we define it as the percentage of estimated tracks that are nowhere close to any true track. With this, we get a fake rate of 0:

      fake_rate

    • For the duplicate rate, we define it as the percentage of estimated tracks that are similar to each other (i.e. really close, practically representing a the same true track). With this, we get a duplicate rate of 0.21:

      duplicate_rate

    The Hough transform is a good method overall. A way to possibly make it more efficient is to store the points of intersection for every bin and later if a bin is selected, then the parameters "m" and "b" get estimated as the median of all the intersections of that bin. This will increase the accuracy up to 0.001, which could be a slight (yet noticeable) improvement.

  • Task 3

    Took a look at the Helix equation. Made some notes. Have some questions to ask.

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024

Tasks 4 & 5

Tasks 4 and 5 are linked (actually 4 can't be done without doing 5 first). I uploaded the code in src/utils/metrics.py (in the new pull request). I don't think it makes much sense to post code here, so I will post the results:

  • Efficiency Rate. Implemented exactly as described above. Computed it separately for every estimated track:

    efficiencies

  • Fake Rate. An estimated track is defined as fake if the number of hits of the leading particle is less than a fraction of the total number of hits (book-kept hits) for that estimated track. Eg: If a track has 10 hits but the leading particle has only 2 hits (could be 5 different particles where each had 2 hits), then the track is considered fake. This was not the case for any estimated track:

    fakes

  • Duplicate Rate. Two tracks are defined duplicates if they have the same leading particle (correct me if I'm wrong!). The result I got for the estimated tracks are:

    duplicates



ToDo: As mentioned yesterday, the number of different (truth) particles in an event is unknown. Therefore, it is wrong to sample the "top 25" tracks, since in real-life scenarios we will not have this information. In order to fix this, I thought of picking all the tracks where their corresponding bins have a minimum number of book-kept hits inside them. This will probably give us a higher number of tracks than needed, but at least we won't undershoot. Maybe discuss this in the next meeting.

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024

Efficiency rate per track we call matching probability, then we use efficiency only for the ensemble of tracks.

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024

Updates tasks:

  • Plot has per matching probability has been done (also the fixes regarding the selection of the leading particle were implemented):

    matching_probability

  • Efficiency per eta-pT values almost done, need to fix some bugs, likely will upload the code tomorrow

  • Selection function: still working on it

from sms2021-tra-tra.

noemina avatar noemina commented on September 4, 2024

That's good!! Can you try to put the matching probability on the x axis?
Something like this for example ;)
image

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024

Exactly, there we can se how we can make a cut.

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024

Screenshot 2021-07-12 at 10 18 45

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024

This is purely in the (x/y)-plane, you ignore that the hit or the track have a longitudinal (z) component.

That's justified for this example, because the magnetic field is constant in z-axis.
-> the helix is a circle in the transverse

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024
  • Task 7

    In order to complete this task I parsed all the dataset (.csv) files and grouped together (in new dataframes) the objects that have similar values (for eta and p_T). Then I run the whole pipeline for each range of values in order to get the efficiency. I got the following results:

    etas_efficiency

    pt_efficiency

    The algorothm seems to be performing ok for the different eta values. For the transverse momentum, it's not performing well for lower values (could be a bug in my code?). Maybe the tracks are so similar that the bin size of (0.0001, 0.0001) can't distinguish them. I will have to look into this further.

  • Task 6

    Fixed the plot, placed the matching probability in the x-axis:

    matching_probability

  • Tasks 8 & 9 (selection/classification & helix Hough transform)

    Ongoing, I have some ideas for the selection phase which I will try to implement tomorrow.

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024
  • Task 8

    Selection: Most methods have been implemented, I have a few questions though for the holes.

  • Task 9

    Ongoing.

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024

Regarding the helical (circular when projected in 2d space) Hough transform:

I started by tackling the one-particle files. Let's take for example this one:

one_p_xy

Since in real-life scenarios we will not know the momentum of a particle for any given hit, I tried to solve this fitting problem without using any extra information. The main idea I came up with (after consulting with Noemi) is to utilize the fact that the circular track must always go through the origin (0, 0). Since finding a circle requires findings 3 variables (a radius and a center), this information above reduces the problem to finding 2 variables that are linearly connected. This means that we can apply the methods from the previous task. For a more detailed explanation of the math:

explanation

The lines in the Hough Space look like this:

one_p_tracks

For the selection of the candidate bins, I selected those that had at least 10-11 hits, since previous analysis showed that there are on average 14 hits per particle. The result I got is:

one_p_fit

I tried to estimate the emittance angle from the circle as the angle of the tangent-line-at-the-origin with x-axis:

one_p_phi

The ground truth value I got (which is computed using: phi = arctan2(py, px)) is slightly different from the estimated one:

fail_phi

I might have to look for bugs in the code.

Now regarding the many-particle files, I picked this one at random:

many_p_xy

The lines in the Hough Space look like this:

many_p_tracks

Running the algorithm will yields the following results:

many_p_fit

By assessing the estimated helical tracks we get the following results:

results

which look kind of promising.

There are some notes I have to make:

  • I want to make the matching-probability and efficiency-per-eta-pT-values plots. I wanted to use PyRoot for the plots, but I can't get it to work. It says that it is not compatible with the Python version I am using. I tried python 2.7, 3.4, 3.5, 3.6, 3.9 and none of the worked. I will try again tomorrow to fix it.
  • I found a possible bug in my efficiency function. When computing the efficiency, I didn't consider the case where 2 estimated tracks might have the same leading particle. In this case, both tracks will be considered as legit when this shouldn't be the case because they are technically duplicates. I was thinking of fixing it, just wanted to consult with you before I pushed the code with the changes in these functions.

from sms2021-tra-tra.

AndrewSpano avatar AndrewSpano commented on September 4, 2024
  • After trying to install ROOT for over 3 hours (I even tried compiling the source code, run into many errors, checked the forum, couldn't solve some of them as there weren't relevant posts), I concluded that there is no point in installing it in this laptop, as I recently bought a new one (which I am currently setting up). I will try to install it in the new laptop once I start using it.

  • I plotted (for one estimated track) the lines in the Hough Space and the estimated intersection point from the binning:

    intersect_plot

    The same results can be seen for any other estimated track, if chosen.

  • I refactored the metrics (now they are almost as modular as it can get).

  • I couldn't find the reason why the estimated emittance angle is incorrect. One thing that I noticed is that for single-particle files, since we know that all the hits belong to the same particle (with the same curvature and momentum), I believe (?) it should be expected that all of them have the same emittance angle. This is not the case:

    strange

    Is this behavior expected?

  • I did a bit of research on other algorithms for line/curve fitting, and I found out about the RANSAC algorithm.

    This is another algorithm that will work out nice for ideal data, but probably not for real-world data. The idea behind it is detect outliers by iteratively sampling random data points and fitting a model's parameter to them. Then we can use that model to find the number of outliers. The parameters that have the least outliers (i.e. the most inliers) will be stored and preferred in the future.

    I implemented it for the x-y plane (circle fitting) and the r-z plane (line fitting). The results are quite good for such a simple algorithm. For the circle (x-y plane) fitting, it managed to find all of the tracks in just 19 seconds:

    circle_ransac

    For the line (r-z plane) fitting, it managed to find 21/25 tracks in just 16 seconds:

    line_ransac

    The notebook can be found in src/notebooks/issue3/RANSAC.ipynb.

  • PS. I haven't uploaded the code yet because the previous pull request hasn't been merged and I think (?) that if I switch branches (in order to upload the new code) then I will lose all the changes. We can sort this out in the morning!.

from sms2021-tra-tra.

noemina avatar noemina commented on September 4, 2024

Effect of binning on the evaluation of the crossing point.
Screenshot_20210720_093902

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024
  • estimate the phi_0 & compare this with 'particles_final['particle_id'].momentum.phi`

from sms2021-tra-tra.

asalzburger avatar asalzburger commented on September 4, 2024

That's re-assuring

from sms2021-tra-tra.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.