Code Monkey home page Code Monkey logo

Comments (14)

HDembinski avatar HDembinski commented on August 13, 2024

Sure, it seems interesting. It looks like this density does not have an analytical integral, so I would implement it is an unnormalized density as a first step. Do you want to give it a try and make a PR?

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Hi. Yes I actually need it for a framework Iā€™m working on so I can make a PR in the process.

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

Cool, let me know if I can help. Have a look at how other pdfs are implemented. Note that the numba-stats distributions are vectorized on the data arguments (x) for performance reasons, but not on the parameters. For your implementation you need the erfc function, which you can import from math.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Hmm, I'm interested however in the total number of signal and background events when fitting the z peak with iminuit so I thought I'd do an extended binned likelihood fit, meaning I do need the cdf for the integral.
Basically what I'm trying to do is write this rootfit thing: https://github.com/cms-egamma/egm_tnp_analysis/blob/master/libCpp/histFitter.C into python with iminuit for my framework.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Hi Hans, perhaps this should be moved into an iminuit discussion but after dissecting the above file that I sent a bit there a few caveats to be addressed to translate it into python.

Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.
Number 2: Some RooHistPdf instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.
Number 3: here, convolutions between pdfs are used by using roofit's FCONV which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.
Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width. Both appear to be present in iminuit which is very helpful.

Number 1, I can implement as unormalized densities in numba-stats. Do you have any tips/suggestions for the other ones? Should we move this discussion somewhere? I will probably need some community help through that process to do it correctly. Thank you in advance! :)

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

Yes, that seems complicated. We don't have an efficient way to compute integrals numerically in Numba, so numerical integrals and convolutions are currently not supported.

Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.

I don't have a ready-made solution for the convolution. You can compute it numerically with scipy.integrate.quad, but it is going to be slow (even though quad is a pretty good integrator).

Number 2: Some RooHistPdf instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.

I advice against treating a histogram like a pdf and interpolating it, because there is no way to do this correctly, you have to invent information that is not there and thus you may bias the fit. Interpolation should only be as a last resort when there is no other option. Can elaborate why this is needed? In some situations, this is avoidable. The Template class in iminuit allows you to fit histogram templates to data histograms just fine without interpolation.

Number 3: here, convolutions between pdfs are used by using roofit's FCONV which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.

Makes sense at first glance to do it with an FFT. Implementing general numerical convolutions in numba-stats is a major undertaking.

Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width. Both appear to be present in iminuit which is very helpful.

It looks like the CDF might be computed analytically.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.

All the other "problems" are in these 4 lines.
What appears to be happening is that the sigResPass and sigResFail pdfs are convoluted using FFT with some RooHistPdf pdfs which as you said add bias because RooHistPdfs invent informartion.
So far I haven't been thinking about how to avoid that. I've just been thinking in terms of "translate this code into python".

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.

That sounds good, but then we need to integrate that again (the pdf of the convolution) to get the cdf of the convolution.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Soo....Mathematica? Unless someone else has done the integral.

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

I managed to find the analytical integral of the cms_shape, so I will implement the pdf and cdf for this distribution, you don't have to start working on this.

I am not keen on implementing the crystalball convolved with a gaussian, so if you want to give that a try, I invite you to make a PR.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.

Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate" or use_pdf="numerical" to the iminuit extended likelihood.
The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.

In general that are some more distributions that ROOT has and are used frequently in particle physics that would be nice to have for the pythonic ecosystem users. The Novosibirsk or Hypatia distributions are a couple for instance (I see there is a 2021 issue for Hypatia).

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.

I did not find it, I believe the RooFit implementation that you showed me computes the normalization numerically.

I used wolframalpha.com to compute the integral.

Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate" or use_pdf="numerical" to the iminuit extended likelihood.

True, but we need the integral anyway to normalize the density to get a pdf.

The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.

Yes, these things are challenging. When you have to deal with convolutions, I believe that RooFit is still the best system.

from numba-stats.

HDembinski avatar HDembinski commented on August 13, 2024

Yes, we need both the Hypathia and the Novosibirsk distribution. Implementing Hypathia is quite involved, but Novosibirsk looks doable. I just added a new issue for Novosibirsk. If you like to contribute something else, you could give Novosibirsk a shot.

from numba-stats.

ikrommyd avatar ikrommyd commented on August 13, 2024

I will go ahead with implementing the CBExGaussShape first because I need it and then I will look into Novosibirsk. Novosibirsk looks like it has a nice analytic form unlike Hypatia.

from numba-stats.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.