hdembinski / numba-stats Goto Github PK

View Code? Open in Web Editor NEW

49.0 4.0 14.0 542 KB

Numba-accelerated statistical distributions

License: MIT License

Python 95.12% Shell 0.08% Jupyter Notebook 4.79%

numba-stats's Introduction

numba-stats

We provide numba-accelerated implementations of statistical distributions for common probability distributions

Uniform
(Truncated) Normal
Log-normal
Poisson
(Truncated) Exponential
Student's t
Voigtian
Crystal Ball
Generalised double-sided Crystal Ball
Tsallis-Hagedorn, a model for the minimum bias pT distribution
Q-Gaussian
Bernstein density (not normalized to unity, use this in extended likelihood fits)
Cruijff density (not normalized to unity, use this in extended likelihood fits)
CMS-Shape

with more to come. The speed gains are huge, up to a factor of 100 compared to scipy. Benchmarks are included in the repository and are run by pytest.

The distributions are optimized for the use in maximum-likelihood fits, where you query a distribution at many points with a single set of parameters.

Usage

Each distribution is implemented in a submodule. Import the submodule that you need and call the functions in the module.

from numba_stats import norm
import numpy as np

x = np.linspace(-10, 10)
mu = 2.0
sigma = 3.0

p = norm.pdf(x, mu, sigma)
c = norm.cdf(x, mu, sigma)

The functions are vectorized on the variate x, but not on the shape parameters of the distribution. Ideally, the following functions are implemented for each distribution:

pdf: probability density function
logpdf: the logarithm of the probability density function (can be computed more efficiently and accurately for some distributions)
cdf: integral of the probability density function
ppf:inverse of the cdf
rvs: to generate random variates

cdf and ppf are missing for some distributions (e.g. voigt), if there is currently no fast implementation available. logpdf is only implemented if it is more efficient and accurate compared to computing log(dist.pdf(...)). rvs is only implemented for distributions that have ppf, which is used to generate the random variates. The implementations of rvs are currently not optimized for highest performance, but turn out to be useful in practice nevertheless.

The distributions in numba_stats can be used in other numba-JIT'ed functions. The functions in numba_stats use a single thread, but the implementations were written so that they profit from auto-parallelization. To enable this, call them from a JIT'ed function with the argument parallel=True,fastmath=True. You should always combine parallel=True with fastmath=True, since the latter enhances the gain from auto-parallelization.

from numba_stats import norm
import numba as nb
import numpy as np

@nb.njit(parallel=True, fastmath=True)
def norm_pdf(x, mu, sigma):
  return norm.pdf(x, mu, sigma)

# this must be an array of float
x = np.linspace(-10, 10)

# these must be floats
mu = 2.0
sigma = 3.0

# uses all your CPU cores
p = norm_pdf(x, mu, sigma)

Note that this is only faster if x has sufficient length (about 1000 elements or more). Otherwise, the parallelization overhead will make the call slower, see benchmarks below.

Troubleshooting

When you use the numba-stats distributions in a compiled function, you need to pass the expected data types. The first argument must be numpy array of floats (32 or 64 bit). The following parameters must be floats. If you pass the wrong arguments, you will get numba errors similar to this one (where parameters were passed as integer instead of float):

numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function pdf at 0x7ff7186b7be0>) found for signature:

 >>> pdf(array(float64, 1d, C), int64, int64)

You won't get these errors when you call the numba-stats PDFs outside of a compiled function, because I added some wrappers which automatically convert the data types for convenience. This is why you can call norm.pdf(1, 2, 3) but norm_pdf(1, 2, 3) (as implemented above) will fail.

Benchmarks

The following benchmarks were produced on an Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz against SciPy-1.10.1. The dotted line on the right-hand figure shows the expected speedup (4x) from parallelization on a CPU with four physical cores.

We see large speed-ups with respect to scipy for almost all distributions. Also calls with short arrays profit from numba_stats, due to the reduced call-overhead. The functions voigt.pdf and t.ppf do not run faster than the scipy versions, because we call the respective scipy implementation written in FORTRAN. The advantage provided by numba_stats here is that you can call these functions from other numba-JIT'ed functions, which is not possible with the scipy implementations, and voigt.pdf still profits from auto-parallelization.

The bernstein.density does not profit from auto-parallelization, on the contrary it becomes much slower, so this should be avoided. This is a known issue, the internal implementation cannot be easily auto-parallelized.

Documentation

To get documentation, please use help() in the Python interpreter.

Functions with equivalents in scipy.stats follow the scipy calling conventions exactly, except for distributions starting with trunc..., which follow a different convention, since the scipy behavior is very impractical. Even so, note that the scipy conventions are sometimes a bit unusual, particular in case of the exponential, the log-normal, and the uniform distribution. See the scipy docs for details.

Contributions

You can help with adding more distributions, patches are very welcome. Implementing a probability distribution is easy. You need to write it in simple Python that numba can understand. Special functions from scipy.special can be used after some wrapping, see submodule numba_stats._special.py how it is done.

numba-stats and numba-scipy

numba-scipy is the official package and repository for fast numba-accelerated scipy functions, are we reinventing the wheel?

Ideally, the functionality in this package should be in numba-scipy and we hope that eventually this will be case. In this package, we don't offer overloads for scipy functions and classes like numba-scipy does. This simplifies the implementation dramatically. numba-stats is intended as a temporary solution until fast statistical functions are included in numba-scipy. numba-stats currently does not depend on numba-scipy, only on numba and scipy.

numba-stats's People

Contributors

Stargazers

Watchers

Forkers

chrisburr matthewkenzie claudiocc1 mhoene adryyan diguida kitkatdafu siv4k mattmonk amanmdesai maxnoe ikrommyd moritzneuberger

numba-stats's Issues

Add Laplace?

Thanks for creating numba-stats!

Since you already have the exponential distribution, would it be relatively easy to add the Laplace distribution? In the mean time, I'll try to use the existing exponential to construct Laplace PDFs.

NumbaWarning: Compilation is falling back to object mode, also NumbaDeprecationWarning

Hello @HDembinski, thank you so much for making this available! It is very helpful indeed.

When I call lognorm.cdf() (python 3.10.2, numba 0.55.1, numba-stats 0.9.0, Windows), on first run (i.e., without pycache), I get a lot of unfriendly looking error messages:

C:\Program Files\Python310\lib\site-packages\numba_stats\lognorm.py:12: NumbaWarning:   
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "pdf" failed type inference due to:   NameError: name 'logpdf' is not defined
  @nb.vectorize(_signatures, cache=True)
C:\Program Files\Python310\lib\site-packages\numba\core\object_mode_passes.py:151: NumbaWarning:   Function "pdf" was compiled in object mode without forceobj=True.
  
File "..\..\..\..\Program Files\Python310\lib\site-packages\numba_stats\lognorm.py", line 13:
  @nb.vectorize(_signatures, cache=True)
  def pdf(x, s, loc, scale):
  ^

  warnings.warn(errors.NumbaWarning(warn_msg,
C:\Program Files\Python310\lib\site-packages\numba\core\object_mode_passes.py:161: NumbaDeprecationWarning:   
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit
  
File "..\..\..\..\Program Files\Python310\lib\site-packages\numba_stats\lognorm.py", line 13:
  @nb.vectorize(_signatures, cache=True)
  def pdf(x, s, loc, scale):
  ^

  warnings.warn(errors.NumbaDeprecationWarning(msg,
C:\Program Files\Python310\lib\site-packages\numba_stats\lognorm.py:12: NumbaWarning:   
Compilation is falling back to object mode WITHOUT looplifting enabled because Function "pdf" failed type inference due to:   NameError: name 'logpdf' is not defined
  @nb.vectorize(_signatures, cache=True)

However, it appears this can easily be fixed in lognorm.py by simply swapping pdf and logpdf, so that pdf sees the definition for logpdf. In terms of speed, it doesn't seem to make a difference, though.

I will go ahead and create a PR...

Several tests fail

=========================================================================================================== FAILURES ===========================================================================================================
___________________________________________________________________________________________________ test_all[crystalball_ex] ___________________________________________________________________________________________________

module = 'crystalball_ex'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[uniform] _______________________________________________________________________________________________________

module = 'uniform'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_____________________________________________________________________________________________________ test_all[qgaussian] ______________________________________________________________________________________________________

module = 'qgaussian'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
____________________________________________________________________________________________________ test_all[crystalball] _____________________________________________________________________________________________________

module = 'crystalball'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_____________________________________________________________________________________________________ test_all[bernstein] ______________________________________________________________________________________________________

module = 'bernstein'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[cruijff] _______________________________________________________________________________________________________

module = 'cruijff'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
________________________________________________________________________________________________________ test_all[norm] ________________________________________________________________________________________________________

module = 'norm'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[poisson] _______________________________________________________________________________________________________

module = 'poisson'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[tsallis] _______________________________________________________________________________________________________

module = 'tsallis'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_______________________________________________________________________________________________________ test_all[voigt] ________________________________________________________________________________________________________

module = 'voigt'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[lognorm] _______________________________________________________________________________________________________

module = 'lognorm'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_____________________________________________________________________________________________________ test_all[truncexpon] _____________________________________________________________________________________________________

module = 'truncexpon'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_______________________________________________________________________________________________________ test_all[expon] ________________________________________________________________________________________________________

module = 'expon'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_________________________________________________________________________________________________________ test_all[t] __________________________________________________________________________________________________________

module = 't'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
______________________________________________________________________________________________________ test_all[cpoisson] ______________________________________________________________________________________________________

module = 'cpoisson'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
_____________________________________________________________________________________________________ test_all[truncnorm] ______________________________________________________________________________________________________

module = 'truncnorm'

    @pytest.mark.parametrize("module", all_modules)
    def test_all(module):
        pytest.importorskip("pydocstyle")
        m = importlib.import_module(f"numba_stats.{module}")
        r = subp.run(["python", "-m", "pydocstyle", m.__file__], stdout=subp.PIPE)
        rc = int(r.returncode)
>       assert rc == 0, r.stdout.decode("utf8")
E       AssertionError: 
E       assert 1 == 0

tests/test_doc.py:20: AssertionError
----------------------------------------------------------------------------------------------------- Captured stderr call -----------------------------------------------------------------------------------------------------
/usr/ports/math/py-numba-stats/work-py39/.bin/python: No module named pydocstyle.__main__; 'pydocstyle' is a package and cannot be directly executed
========================================================================================== 16 failed, 211 passed in 307.00s (0:05:07) ==========================================================================================
*** Error code 1

Version: 1.2.0
Python-3.9
FreeBSD 13.2

Add Cruijff distribution

https://gitlab.cern.ch/LHCb-RD/vrd-b2hemu_run2/-/blob/master/scripts/fitter/fitter/src/RooCruijff.cpp

#include "fitter/RooCruijff.h"

RooCruijff::RooCruijff(){}

RooCruijff::RooCruijff(const char* name, const char* title,
    RooAbsReal& _x,
    RooAbsReal& _mean,
    RooAbsReal& _sigmaL,
    RooAbsReal& _sigmaR,
    RooAbsReal& _alphaL,
    RooAbsReal& _alphaR,
    RooAbsReal& _betaL
    //RooAbsReal& _betaR
    ) :
  RooAbsPdf(name,title), 
  x("x","x",this,_x),
  mean("mean","mean",this,_mean),
  sigmaL("sigmaL","sigmaL",this,_sigmaL),
  sigmaR("sigmaR","sigmaR",this,_sigmaR),
  alphaL("alphaL","alphaL",this,_alphaL),
  alphaR("alphaR","alphaR",this,_alphaR),
  betaL("betaL","betaL",this,_betaL)
  //betaR("betaR","betaR",this,_betaR)
{}

RooCruijff::RooCruijff(const RooCruijff& other, const char* name) :
  RooAbsPdf(other,name), 
  x("x",this,other.x),
  mean("mean",this,other.mean),
  sigmaL("sigmaL",this,other.sigmaL),
  sigmaR("sigmaR",this,other.sigmaR),
  alphaL("alphaL",this,other.alphaL),
  alphaR("alphaR",this,other.alphaR),
  betaL("betaL",this,other.betaL)
  //betaR("betaR",this,other.betaR)
{}


double RooCruijff::evaluate() const
{
  double D = (x-mean);
  double sigma = D > 0 ? sigmaR : sigmaL;
  double alpha = D > 0 ? alphaR : alphaL;
  //double beta  = D > 0 ? betaR  : betaL; 
  double beta  = betaL; 
  double arg = D*D*(1 + alpha*D*D / (2*beta*beta))/ ( 2*sigma*sigma + alpha*D*D );
  //if(TMath::Abs(2*sigma*sigma + alpha*D*D)< 1e-2 ) cout << "mah " << 2*sigma*sigma + alpha*D*D << endl ;
  return exp( - arg  );
}

poisson: fix poisson.pmf(0, 0)

In SciPy, poisson.pmf(0, 0) returns 1. Numba-stats returns NaN.

Add the .fit method?

Is it feasible to add the fit method? I'm looking at https://github.com/erdogant/distfit/ and thinking incorporating your other functions would be really good but I suppose it doesn't work without fit.

GIbberish numbers

from numba_stats import norm
norm.ppf(norm.cdf(0.3, 0, 1), 0, 1)

gives array(1.49754699e-314)

Allow broadcasting over parameters

Currently, the different functions of this module only allow array input for the random variable, not for the parameters of the distributions.

It would be very useful, e.g. for likelihoods, if the functions also supported broadcasting over the parameters, e.g.:

poisson.logpmf(np.array([1, 2, 3]), np.array([2, 1, 5]))

Same for norm, etc.

voigtian

I think you have sigma and width backwards in the voigtian function

You have
return voigt_profile(x - mu, gamma, sigma)

In https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.voigt_profile.html it says
scipy.special.voigt_profile(x, sigma, gamma, out=None)

PS: I owe you a pull request on the lognormal, I haven't forgotten, just busy and very much a non-expert in git, so I need to figure that out, sigh.

Inconsistency between scipy.sc ppf and numba_stats.norm.ppf

Hello!

There seem to be a discrepancy between the methods mentioned in the tittle. Indeed, for singletons scipy seems to return a singleton while numba_stats always seem to return an array like. This is consistent with scipy.sc's documentation, but not the actual behavior:

In [1]: import scipy.stats as sc
In [2]: sc.norm.ppf(0.5,0.2,1)
Out[2]: 0.2

In [3]: from numba_stats import norm

In [4]: norm.ppf(0.5, 0.2, 1)
Out[4]: array(0.2)

I've noticed this as some legacy code that I am running broke down after updating numba_stats.

Before I change my codebase I wanted to hear what is the sensible approach here? Will numba_stats adjust to this behavior or will it keep returning array like objects?

Error with scipy 1.12.0: No function 'betainc' found in cython_special

Once again thank you for this amazingly useful package!

When using numba-stats 1.4.1 with the new scipy 1.12.0 (and numpy 1.26.3, numby 0.58.1, python 3.11.5), a simple from numba_stats import lognorm leads to the following error:

Traceback (most recent call last):
  File "c:\Data\Work\Quantix\Code\quantix\test.py", line 1, in <module>
    from numba_stats import lognorm
  File "C:\Program Files\Python311\Lib\site-packages\numba_stats\lognorm.py", line 9, in <module>
    from . import norm as _norm
  File "C:\Program Files\Python311\Lib\site-packages\numba_stats\norm.py", line 9, in <module>
    from ._special import ndtri as _ndtri
  File "C:\Program Files\Python311\Lib\site-packages\numba_stats\_special.py", line 29, in <module>
    betainc = get("betainc", float64(float64, float64, float64))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\numba_stats\_special.py", line 11, in get
    addr = get_cython_function_address("scipy.special.cython_special", name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\numba\core\extending.py", line 468, in get_cython_function_address
    return _import_cython_function(module_name, function_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: No function 'betainc' found in __pyx_capi__ of 'scipy.special.cython_special'

This works fine with scipy 1.11.4.

It turns out that in scipy/special/cython_special.pyx (which is auto-generated), line 100, scipy 1.12.0 does not contain the stub for float betaincc(float, float, float) anymore which existed in scipy 1.11.4:

- :py:func:`~scipy.special.betainc`::

        double betainc(double, double, double)

In src/numba_stats/_special.py, changing
betainc = get("betainc", float64(float64, float64, float64))
to
betainc = get("betainc", double(double, double, double))
(and importing double from numba.types) does not seem to help, though.

Would you know how to fix this? Thanks!

The code from README doesn't work

The code:

from numba_stats import norm
import numba as nb
import numpy as np

@nb.njit(parallel=True, fastmath=True)
def norm_pdf(x, mu, sigma):
  return norm.pdf(x, mu, sigma)

x = np.linspace(-10, 10)
mu = 2
sigma = 3

# uses all your CPU cores
p = norm_pdf(x, mu, sigma)
print(p)

The error:

Traceback (most recent call last):
  File "/Users/user/Projects/mydir/myproject/sandbox/numba_stats_ex.py", line 14, in <module>
    p = norm_pdf(x, mu, sigma)
  File "/Users/user/miniconda3/envs/myproject/lib/python3.10/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/Users/user/miniconda3/envs/myproject/lib/python3.10/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function pdf at 0x7ff7186b7be0>) found for signature:
 
 >>> pdf(array(float64, 1d, C), int64, int64)
 
There are 2 candidate implementations:
  - Of which 2 did not match due to:
  Overload in function '_ol_pdf': File: ../../../../../../Projects/mydir/myproject/sandbox/<string>: Line 0.
    With argument(s): '(array(float64, 1d, C), int64, int64)':
   Rejected as the implementation raised a specific error:
     TypingError: argument 1 must be of type int64
  raised from /Users/user/miniconda3/envs/myproject/lib/python3.10/site-packages/numba_stats/_util.py:86

During: resolving callee type: Function(<function pdf at 0x7ff7186b7be0>)
During: typing of call at /Users/user/Projects/mydir/myproject/sandbox/numba_stats_ex.py (7)


File "numba_stats_ex.py", line 7:
def norm_pdf(x, mu, sigma):
  return norm.pdf(x, mu, sigma)

Versions:
python: 3.10.9
numba-stats: v1.4.1
numba: 0.58.1
numpy: 1.26.1

Add double-sided Crystall Ball

expon with loc (threshold) not producing expected results compared to scipy.

Could you check the following strange behavior of expon?

import numba_stats.expon
import scipy.stats
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-15, 25, 0.1)

plt.plot(x, scipy.stats.expon.pdf(x,-10, 3))
plt.plot(x, scipy.stats.expon.pdf(x,  0, 3))
plt.plot(x, scipy.stats.expon.pdf(x, 10, 3))
plt.title("Expected (scipy.stats)")
plt.show()

plt.plot(x, numba_stats.expon.pdf(x,-10, 3))
plt.plot(x, numba_stats.expon.pdf(x,  0, 3))
plt.plot(x, numba_stats.expon.pdf(x, 10, 3))
plt.title("Actual (numba_stats)")
plt.show()

Replacing vectorize brakes code

I used uniform from a previous release, and it worked fine. Since the latest update that replaced vectorize with jit my code stopped working. Here is a minimal working example:

import numpy as np
import numba as nb
from numba import njit

from numba_stats import uniform as uniform_numba

@njit()
def get_lnprior_helper():
    """jittable helper for calculating the log prior"""
    prior = 1.0

    #loop through uniform parameters
    for i in range(3):
       prior *= uniform_numba.pdf(0.5, 0.0, 1.0)

    return prior

print(get_lnprior_helper())

This now gives the following error:

No implementation of function Function(<function pdf at 0x114f8bdc0>) found for signature:
 
 >>> pdf(float64, float64, float64)

This seems to only happen if it's within a jitted function. If I remove @njit() from above, it works fine. I'm at numba 0.53.0.

Any idea why this might be happening?

Using norm.cdf with nopython=True

I used the following function:

def bs_call_price(S, r, sigma, t, T, K):

    ttm = T - t

    if ttm < 0:
        return 0.0
    elif ttm == 0.0:
        return np.maximum(S - K, 0.0)

    vol = sigma * np.sqrt(ttm)

    d_minus = np.log(S / K) + (r - 0.5 * sigma ** 2) * ttm
    d_minus /= vol

    d_plus = d_minus + vol

    res = S * norm.cdf(d_plus, loc=0, scale=1)
    res -= K * np.exp(-r * ttm) * norm.cdf(d_minus, loc=0,scale=1)

    return np.round(res,2)

If I use the @jit(nopython=False, forceobj=True, cache=True) it works fine and is much faster than normal scipy, but if I use
@jit(nopython=True, cache=True) I get an error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function cdf at 0x16e22b8b0>) found for signature:

cdf(float64, loc=Literalint, scale=Literalint)

There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function '_ol_cdf': File: ../../../../../../Desktop/WQU-Course/MScFE/MScFE_620/Module_4/Lesson_3/: Line 0.
With argument(s): '(float64, loc=int64, scale=int64)':
Rejected as the implementation raised a specific error:
TypingError: first argument must be an array of floating point type

Passing a arrays as arguments , norm.cdf(np.array(d_plus), loc=np.array(0.0), scale=np.array(1.0)) raises an error:

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function cdf at 0x16e22b8b0>) found for signature:

cdf(array(float64, 0d, C), loc=array(float64, 0d, C), scale=array(float64, 0d, C))

There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload in function '_ol_cdf': File: ../../../../../../Desktop/WQU-Course/MScFE/MScFE_620/Module_4/Lesson_3/: Line 0.
With argument(s): '(array(float64, 0d, C), loc=array(float64, 0d, C), scale=array(float64, 0d, C))':
Rejected as the implementation raised a specific error:
TypingError: argument 1 must be of type array(float64, 0d, C)

But I can't understand where the issue is (the argument seems fine to me).
numba-stats 1.1.0
numba 0.56.3
numpy 1.23.3

Add lognormal

log pdf?

Thank you for adding the lognormal!
Since it looks like you (almost) calculate the log and then return the exponent, could you add a logpdf function?
Since one of the main use-cases is in log-likelihood fits, having a logpdf as well would speed things up on the user's side.
The scipy.stats implementation does have a logpdf function (and a logcdf...)
Same arguments apply to the normal.

Thank you again!

feat: add RooCMSShape PDF in numba-stats

Hi @HDembinski, would it seem interesting to you to add the RooCMSShape PDF defined here in cmssw: https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/TagAndProbe/src/RooCMSShape.cc to numba-stats? It's pretty common within CMS to use it to fit Z peaks for tag & probe studies and it can be built from existing distributions.