Comments (14)
Sure, it seems interesting. It looks like this density does not have an analytical integral, so I would implement it is an unnormalized density as a first step. Do you want to give it a try and make a PR?
from numba-stats.
Hi. Yes I actually need it for a framework Iām working on so I can make a PR in the process.
from numba-stats.
Cool, let me know if I can help. Have a look at how other pdfs are implemented. Note that the numba-stats distributions are vectorized on the data arguments (x) for performance reasons, but not on the parameters. For your implementation you need the erfc function, which you can import from math
.
from numba-stats.
Hmm, I'm interested however in the total number of signal and background events when fitting the z peak with iminuit so I thought I'd do an extended binned likelihood fit, meaning I do need the cdf for the integral.
Basically what I'm trying to do is write this rootfit thing: https://github.com/cms-egamma/egm_tnp_analysis/blob/master/libCpp/histFitter.C into python with iminuit for my framework.
from numba-stats.
Hi Hans, perhaps this should be moved into an iminuit discussion but after dissecting the above file that I sent a bit there a few caveats to be addressed to translate it into python.
Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.
Number 2: Some RooHistPdf
instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.
Number 3: here, convolutions between pdfs are used by using roofit's FCONV
which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.
Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width
. Both appear to be present in iminuit which is very helpful.
Number 1, I can implement as unormalized densities in numba-stats. Do you have any tips/suggestions for the other ones? Should we move this discussion somewhere? I will probably need some community help through that process to do it correctly. Thank you in advance! :)
from numba-stats.
Yes, that seems complicated. We don't have an efficient way to compute integrals numerically in Numba, so numerical integrals and convolutions are currently not supported.
Number 1: I have to implement RooCMSShape and also a convolution of a gaussian with a crystal ball in numba-stats.
I don't have a ready-made solution for the convolution. You can compute it numerically with scipy.integrate.quad, but it is going to be slow (even though quad is a pretty good integrator).
Number 2: Some RooHistPdf instances are used at some point which If I understood the root documentation correctly, estimate a pdf from a histogram density by using polynomial interpolation between the bins (perhaps KDE would be better here). I also have to implement that in python.
I advice against treating a histogram like a pdf and interpolating it, because there is no way to do this correctly, you have to invent information that is not there and thus you may bias the fit. Interpolation should only be as a last resort when there is no other option. Can elaborate why this is needed? In some situations, this is avoidable. The Template class in iminuit allows you to fit histogram templates to data histograms just fine without interpolation.
Number 3: here, convolutions between pdfs are used by using roofit's FCONV which uses a FFT to do it. These also have to implemented. They appear to be experimental in zfit.
Makes sense at first glance to do it with an FFT. Implementing general numerical convolutions in numba-stats is a major undertaking.
Number 4: Since I'm interested in the total number of signal and background events, an extended binned likelihood is to be used but the analytic cdf is not gonna be available meaning that it has to be approximated either by numerical integration of the pdf or by doing pdf(bin center) * bin width. Both appear to be present in iminuit which is very helpful.
It looks like the CDF might be computed analytically.
from numba-stats.
Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.
All the other "problems" are in these 4 lines.
What appears to be happening is that the sigResPass
and sigResFail
pdfs are convoluted using FFT with some RooHistPdf
pdfs which as you said add bias because RooHistPdf
s invent informartion.
So far I haven't been thinking about how to avoid that. I've just been thinking in terms of "translate this code into python".
from numba-stats.
Oh the convolution of a gaussian with a crystal ball has an analytic form: I've seen it in some paper I believe in the past and is also implemented here.
That sounds good, but then we need to integrate that again (the pdf of the convolution) to get the cdf of the convolution.
from numba-stats.
Soo....Mathematica? Unless someone else has done the integral.
from numba-stats.
I managed to find the analytical integral of the cms_shape, so I will implement the pdf and cdf for this distribution, you don't have to start working on this.
I am not keen on implementing the crystalball convolved with a gaussian, so if you want to give that a try, I invite you to make a PR.
from numba-stats.
Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.
Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate"
or use_pdf="numerical"
to the iminuit extended likelihood.
The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.
In general that are some more distributions that ROOT has and are used frequently in particle physics that would be nice to have for the pythonic ecosystem users. The Novosibirsk or Hypatia distributions are a couple for instance (I see there is a 2021 issue for Hypatia).
from numba-stats.
Great! Thanks! Did you evaluate it or did you find it somewhere (I'm curious where that's why I'm asking)? I could attempt the other one sure.
I did not find it, I believe the RooFit implementation that you showed me computes the normalization numerically.
I used wolframalpha.com to compute the integral.
Even if a distribution doesn't have an analytic CDF, I would be able to just tackle that by passing in use_pdf="approximate" or use_pdf="numerical" to the iminuit extended likelihood.
True, but we need the integral anyway to normalize the density to get a pdf.
The other caveats that I mentioned worry me more because of these convolutions of estimated pdfs from templates with analytic pdfs. I will think about how to tackle that.
Yes, these things are challenging. When you have to deal with convolutions, I believe that RooFit is still the best system.
from numba-stats.
Yes, we need both the Hypathia and the Novosibirsk distribution. Implementing Hypathia is quite involved, but Novosibirsk looks doable. I just added a new issue for Novosibirsk. If you like to contribute something else, you could give Novosibirsk a shot.
from numba-stats.
I will go ahead with implementing the CBExGaussShape
first because I need it and then I will look into Novosibirsk. Novosibirsk looks like it has a nice analytic form unlike Hypatia.
from numba-stats.
Related Issues (20)
- Add Hypathia
- Add lognormal
- log pdf? HOT 2
- poisson: fix poisson.pmf(0, 0)
- voigtian HOT 2
- NumbaWarning: Compilation is falling back to object mode, also NumbaDeprecationWarning HOT 2
- Add Cruijff distribution HOT 4
- Inconsistency between scipy.sc ppf and numba_stats.norm.ppf HOT 2
- Replacing vectorize brakes code HOT 4
- Using norm.cdf with nopython=True HOT 4
- GIbberish numbers HOT 1
- Several tests fail HOT 2
- Allow broadcasting over parameters HOT 4
- Add Laplace?
- expon with loc (threshold) not producing expected results compared to scipy. HOT 1
- Add the .fit method? HOT 1
- The code from README doesn't work HOT 3
- Error with scipy 1.12.0: No function 'betainc' found in cython_special HOT 1
- Add Novosibirsk distribution
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
š Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ššš
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ā¤ļø Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from numba-stats.