Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

feat: add RooCMSShape PDF in numba-stats about numba-stats HOT 14 CLOSED

ikrommyd commented on August 13, 2024

feat: add RooCMSShape PDF in numba-stats

from numba-stats.

Comments (14)

HDembinski commented on August 13, 2024

Sure, it seems interesting. It looks like this density does not have an analytical integral, so I would implement it is an unnormalized density as a first step. Do you want to give it a try and make a PR?

from numba-stats.

ikrommyd commented on August 13, 2024

Hi. Yes I actually need it for a framework I’m working on so I can make a PR in the process.

from numba-stats.

HDembinski commented on August 13, 2024

Cool, let me know if I can help. Have a look at how other pdfs are implemented. Note that the numba-stats distributions are vectorized on the data arguments (x) for performance reasons, but not on the parameters. For your implementation you need the erfc function, which you can import from math.

from numba-stats.

ikrommyd commented on August 13, 2024

Hmm, I'm interested however in the total number of signal and background events when fitting the z peak with iminuit so I thought I'd do an extended binned likelihood fit, meaning I do need the cdf for the integral.
Basically what I'm trying to do is write this rootfit thing: https://github.com/cms-egamma/egm_tnp_analysis/blob/master/libCpp/histFitter.C into python with iminuit for my framework.

from numba-stats.

ikrommyd commented on August 13, 2024

Hi Hans, perhaps this should be moved into an iminuit discussion but after dissecting the above file that I sent a bit there a few caveats to be addressed to translate it into python.

Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.
Number 2: Some RooHistPdf instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.
Number 3: here, convolutions between pdfs are used by using roofit's FCONV which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.
Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width. Both appear to be present in iminuit which is very helpful.

Number 1, I can implement as unormalized densities in numba-stats. Do you have any tips/suggestions for the other ones? Should we move this discussion somewhere? I will probably need some community help through that process to do it correctly. Thank you in advance! :)

from numba-stats.

HDembinski commented on August 13, 2024

Yes, that seems complicated. We don't have an efficient way to compute integrals numerically in Numba, so numerical integrals and convolutions are currently not supported.

Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.

I don't have a ready-made solution for the convolution. You can compute it numerically with scipy.integrate.quad, but it is going to be slow (even though quad is a pretty good integrator).

Number 2: Some RooHistPdf instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.

I advice against treating a histogram like a pdf and interpolating it, because there is no way to do this correctly, you have to invent information that is not there and thus you may bias the fit. Interpolation should only be as a last resort when there is no other option. Can elaborate why this is needed? In some situations, this is avoidable. The Template class in iminuit allows you to fit histogram templates to data histograms just fine without interpolation.

Number 3: here, convolutions between pdfs are used by using roofit's FCONV which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.

Makes sense at first glance to do it with an FFT. Implementing general numerical convolutions in numba-stats is a major undertaking.

Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width. Both appear to be present in iminuit which is very helpful.

It looks like the CDF might be computed analytically.

from numba-stats.

ikrommyd commented on August 13, 2024

Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.

All the other "problems" are in these 4 lines.
What appears to be happening is that the sigResPass and sigResFail pdfs are convoluted using FFT with some RooHistPdf pdfs which as you said add bias because RooHistPdfs invent informartion.
So far I haven't been thinking about how to avoid that. I've just been thinking in terms of "translate this code into python".

from numba-stats.

HDembinski commented on August 13, 2024

Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.

That sounds good, but then we need to integrate that again (the pdf of the convolution) to get the cdf of the convolution.

from numba-stats.

ikrommyd commented on August 13, 2024

Soo....Mathematica? Unless someone else has done the integral.

from numba-stats.

HDembinski commented on August 13, 2024

I managed to find the analytical integral of the cms_shape, so I will implement the pdf and cdf for this distribution, you don't have to start working on this.

I am not keen on implementing the crystalball convolved with a gaussian, so if you want to give that a try, I invite you to make a PR.

from numba-stats.

ikrommyd commented on August 13, 2024

Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.

Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate" or use_pdf="numerical" to the iminuit extended likelihood.
The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.

In general that are some more distributions that ROOT has and are used frequently in particle physics that would be nice to have for the pythonic ecosystem users. The Novosibirsk or Hypatia distributions are a couple for instance (I see there is a 2021 issue for Hypatia).

from numba-stats.

HDembinski commented on August 13, 2024

Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.

I did not find it, I believe the RooFit implementation that you showed me computes the normalization numerically.

I used wolframalpha.com to compute the integral.

Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate" or use_pdf="numerical" to the iminuit extended likelihood.

True, but we need the integral anyway to normalize the density to get a pdf.

The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.

Yes, these things are challenging. When you have to deal with convolutions, I believe that RooFit is still the best system.

from numba-stats.

HDembinski commented on August 13, 2024

Yes, we need both the Hypathia and the Novosibirsk distribution. Implementing Hypathia is quite involved, but Novosibirsk looks doable. I just added a new issue for Novosibirsk. If you like to contribute something else, you could give Novosibirsk a shot.

from numba-stats.

ikrommyd commented on August 13, 2024

I will go ahead with implementing the CBExGaussShape first because I need it and then I will look into Novosibirsk. Novosibirsk looks like it has a nice analytic form unlike Hypatia.

from numba-stats.

feat: add RooCMSShape PDF in numba-stats about numba-stats HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent