zfit / phasespace Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 9.0 29.68 MB

Phase space generation implemented in TensorFlow

Home Page: https://phasespace.rtfd.io

License: BSD 3-Clause "New" or "Revised" License

Python 93.33% C++ 3.68% TeX 2.99%

hep physics-simulation

phasespace's Introduction

zfit: scalable pythonic fitting

zfit is a highly scalable and customizable model manipulation and likelihood fitting library. It uses the same computational backend as TensorFlow and is optimised for simple and direct manipulation of probability density functions. The project is affiliated with and well integrated into Scikit-HEP, the HEP Python ecosystem.

Tutorials: Interactive introduction and tutorials
Quick start: Example scripts
Documentation: See stable documentation or latest documentation
Questions: see the FAQ, ask on StackOverflow (with the zfit tag) or contact us directly.
Physics, HEP: zfit-physics is the place to contribute and find more HEP related content
Statistical inference: hepstats for limits, CI, sWeights and more

If you use zfit in research, please consider citing.

N.B.: zfit is currently in beta stage, so while most core parts are established, some may still be missing and bugs may be encountered. It is, however, mostly ready for production, and is being used in analyses projects. If you want to use it for your project and you are not sure if all the needed functionality is there, feel free to contact us.

Installation

zfit is available on pip and conda-forge. To install it (recommended: use a virtual/conda env!) with all the dependencies (minimizers, uproot, ...), use

pip install -U zfit[all]

(the -U just indicates to upgrade zfit, in case you have it already installed) or for minimal dependencies

pip install zfit

For conda/mamba, use

conda install -c conda-forge zfit

How to use

While the zfit library provides a model fitting and sampling framework for a broad list of applications, we will illustrate its main features with a simple example by fitting a Gaussian distribution with an unbinned likelihood fit and a parameter uncertainty estimation.

Example in short

obs = zfit.Space('x', limits=(-10, 10))

# create the model
mu    = zfit.Parameter("mu"   , 2.4, -1, 5)
sigma = zfit.Parameter("sigma", 1.3,  0, 5)
gauss = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

# load the data
data_np = np.random.normal(size=10000)
data = zfit.Data.from_numpy(obs=obs, array=data_np)

# build the loss
nll = zfit.loss.UnbinnedNLL(model=gauss, data=data)

# minimize
minimizer = zfit.minimize.Minuit()
result = minimizer.minimize(nll).update_params()

# calculate errors
param_errors = result.hesse()

This follows the zfit workflow

Full explanation

The default space (e.g. normalization range) of a PDF is defined by an observable space, which is created using the zfit.Space class:

obs = zfit.Space('x', limits=(-10, 10))

To create a simple Gaussian PDF, we define its parameters and their limits using the zfit.Parameter class.

# syntax: zfit.Parameter("any_name", value, lower, upper)
  mu    = zfit.Parameter("mu"   , 2.4, -1, 5)
  sigma = zfit.Parameter("sigma", 1.3,  0, 5)
  gauss = zfit.pdf.Gauss(obs=obs, mu=mu, sigma=sigma)

For simplicity, we create the dataset to be fitted starting from a numpy array, but zfit allows for the use of other sources such as ROOT files:

mu_true = 0
sigma_true = 1
data_np = np.random.normal(mu_true, sigma_true, size=10000)
data = zfit.Data.from_numpy(obs=obs, array=data_np)

Fits are performed in three steps:

Creation of a loss function, in our case a negative log-likelihood.
Instantiation of our minimiser of choice, in the example the Minuit.
Minimisation of the loss function.

# Stage 1: create an unbinned likelihood with the given PDF and dataset
nll = zfit.loss.UnbinnedNLL(model=gauss, data=data)

# Stage 2: instantiate a minimiser (in this case a basic minuit)
minimizer = zfit.minimize.Minuit()

# Stage 3: minimise the given negative log-likelihood
result = minimizer.minimize(nll).update_params()

Errors are calculated with a further function call to avoid running potentially expensive operations if not needed:

param_errors = result.hesse()

Once we've performed the fit and obtained the corresponding uncertainties, we can examine the fit results:

print("Function minimum:", result.fmin)
print("Converged:", result.converged)
print("Full minimizer information:", result)

# Information on all the parameters in the fit
params = result.params
print(params)

# Printing information on specific parameters, e.g. mu
print("mu={}".format(params[mu]['value']))

And that's it! For more details and information of what you can do with zfit, checkout the latest documentation.

Why?

The basic idea behind zfit is to offer a Python oriented alternative to the very successful RooFit library from the ROOT data analysis package that can integrate with the other packages that are part if the scientific Python ecosystem. Contrary to the monolithic approach of ROOT/RooFit, the aim of zfit is to be light and flexible enough t o integrate with any state-of-art tools and to allow scalability going to larger datasets.

These core ideas are supported by two basic pillars:

The skeleton and extension of the code is minimalist, simple and finite: the zfit library is exclusively designed for the purpose of model fitting and sampling with no attempt to extend its functionalities to features such as statistical methods or plotting.
zfit is designed for optimal parallelisation and scalability by making use of TensorFlow as its backend. The use of TensorFlow provides crucial features in the context of model fitting like taking care of the parallelisation and analytic derivatives.

Prerequisites

zfit works with Python versions 3.9 and above. The main dependency is tensorflow: zfit follows a close version compatibility with TensorFlow.

For a full list of all dependencies, check the requirements.

Installing

zfit is currently available on PyPI and conda-forge.

If possible, use a conda or virtual environment and do:

$ pip install zfit

For the newest development version, you can install the version from git with

$ pip install git+https://github.com/zfit/zfit

Contributing

Any idea of how to improve the library? Or interested to write some code? Contributions are always welcome, please have a look at the Contributing guide.

Contact

You can contact us directly:

via e-mail: [email protected]
join our Gitter channel

Original Authors

Jonas Eschle <[email protected]>

Albert Puig <[email protected]>

Rafael Silva Coutinho <[email protected]>

See here for all authors and contributors

Acknowledgements

zfit has been developed with support from the University of Zurich and the Swiss National Science Foundation (SNSF) under contracts 168169 and 174182.

The idea of zfit is inspired by the TensorFlowAnalysis framework developed by Anton Poluektov and TensorProb by Chris Burr and Igor Babuschkin using the TensorFlow open source library and more libraries.

phasespace's People

Contributors

Stargazers

Watchers

Forkers

marinang gitter-badger yoshimotomasahir simonthor eduardo-rodrigues alberto-sanchez stroblme sergeigribanov redeboer

phasespace's Issues

Python 3.12 support

Will be available with the new TF release tensorflow/tensorflow#62003 (comment)

Release version mixup

Reviewing this for JOSS, I found that installing from the source code and installing with pip install phasespace produced different versions of the code, both with version number 1.0.2.

Installing from pip gives me an older version of the code with the deprecated phasespace.Particle class instead of phasespace.GenParticle.

ReadTheDocs build fails

As mentioned in #66 the RTD docs fails: https://readthedocs.org/projects/phasespace/builds/15190928/

Add post-generation hooks

We could add the possibility of running arbitrary "post-generation-hooks" on particles. This would allow to apply smearing or other detector effects à la RapidSim.

Encance decaychain constructor

Improvement of the constructor:
In practice I would always define the zfit parameter via something like this example I posted at some point in this review:

from decaylanguage import DecayMode, DecayChain
dm1 = DecayMode(1, 'K- pi+ pi+ pi0', model='PHSP', zfit="rel-BW")
dm2 = DecayMode(1, 'gamma gamma')
dc = DecayChain('D+', {'D+':dm1, 'pi0':dm2})

(It is not the same as your example but the point I want to make is obvious.)

This makes me think that it may be rather relevant to enhance this constructor to provide the ability to specify a map between particle and model. In your case I could simply have an argument of the kind particle_model_map={"D0":"gauss"} and the step above would be done internally, which is so much better for the user. Of course this would imply that all D0 get the same model, and one would need to define aliases if wanting to differentiate 2 D0's, but that's no different to what decay files do, so not an issue at all,

You may want to delay this to a follow-up MR. Dunno, up to you.

Originally posted by @eduardo-rodrigues in #63 (comment)

Be explicit about the units you use

Hi all, it seems to me that you are using MeV for energy/mass units, and in fact following the HEP system of units. May I suggest that you make this explicit in the README and also in relevant bits of docstrings?

In fact, given that Scikit-HEP and zfit are affiliated now (info to go online soon :-)), and given that we have the little and trivial hepunits package with a bunch of units and constants, would you consider making use of the package in some (at least, alternative) examples? I mean, you can have your simple examples as now but could also have other more elaborate/neat examples as extra, or in a notebook. Consider

import phasespace
from hepunits import MeV

B0_MASS = 5279.58 * MeV
PION_MASS = 139.57018 * MeV
KAON_MASS = 493.677 * MeV

weights, particles = phasespace.generate_decay(...)

[Feature] expose weight calculation of events as method

It could be useful to have a method that returns the event weights, or more direct, that tells whether something is inside or outside the phasespace (as @abhijitm08 asked for). I think it goes well into the scope of phasespace, most of all since the Particle knows everything it has to know.

My idea: extract the pdk function, expose it (https://github.com/zfit/phasespace/blob/master/phasespace/phasespace.py#L343). Make sure it returns something negative (or 0) if it is not allowed. The selection (a tf.where(weights > 0, ...) itself may be done on the user site and does not have to be part of Particle.

What do you think, @apuignav?

not compatible with TF 2.0

Hi, i am trying to run the phasespace example in the documentation and when using phasespace.nbody_decay I get the error:

TypeError: convert_to_tensor_v2() got an unexpected keyword argument 'preferred_dtype'

Rename Particle class

Should we rename Particle to avoid name clashes with https://github.com/scikit-hep/particle?

@eduardo-rodrigues Suggestions?

Create docs

Create a doc structure, with API and a bit of README.

Move to TF2.0

As seen in #33 we're not yet tf2.0 compatible.

It improves things a lot actually for phasespace. Basically, you only write functions and it automatically does the graph, run etc for you.

My proposal for the new API: have only one API that returns a Tensor. But since this is "lazy" evaluated, the Tensor can act as a numpy array in many situations respectively has a .numpy() attribute, that converts it to numpy.

To change: we release a version restricting TF to < 2 and warn about the new behavior. Then we make a completely new release (2.0 even?) where we change the behavior and restrict to TF >= 2. Do you agree, @apuignav

fix unittest failing with uproot

see PR #40

Examples of interaction with other packages

It would be nice to write a set of notebooks showcasing how phasespace can be used together with packages such as zfit, particle, etc, and also how to do things like generating LHC-like kinematics, etc.

Convert TF calls to experimental numpy

We also recently moved to the tnp interface in zfit (zfit/zfit#310) which seemed to mostly work, but it's tricky as it is still experimental and they mean it. For example, while before a TF.ndarray was returned, it is now a TF.Tensor again; however a new numpy behavior switch was introduced (https://www.tensorflow.org/api_docs/python/tf/experimental/numpy/experimental_enable_numpy_behavior)

But overall it's clear that we want to go in this direction to have not only numpy but also jax etc in the future as a possible backend

Upgrade random number generation

Replace the deprecated tf.random.uniform with the new RandomGenerator

Open question: how to expose this to the user?

Proposal: follow the api of TFPs distributions interface of "sample", which allows for a seed or a SeedStream. Not sure whether to allow the later.

Looking for an example of using `boost_to` in `generate`?

Hi Maintainers!

I'm trying to use phasespace for a small study and I need to boost the initial particle in some direction. I'm trying to write stuff like

Z_MASS = 91000.00
MUON_MASS = 105.11

weights, particles = phasespace.nbody_decay(Z_MASS,
                                            [MUON_MASS, MUON_MASS]).generate(n_events=1, boost_to=np.array([10000,0,0, 9000000]))

and obviously I'm not getting it. I'm not really sure what to pass in for the boost_to option. Do you have any examples? Thanks!

Stricter user-facing interface

Right now, user can provide a tensor of input momenta to generate. This is then used to calculate the number of events (if n_events is not given) and events are generated according to these momenta.

The proposal is to remove the momentum argument and simply use the mass of the top Particle, and add an additional boost argument to the user-facing functions such that the generated events are boosted afterwards. This would be needed in all methods discussed in #15.

This makes generation much simpler (everything is built from particle at rest, so accept-reject procedures can simply throw events away and generate again) while at the same time providing the possibility to generate physics-like distributions.

Switch from `pkg_resources` to `importlib`

The __version__ is currently extracted with pkg_resources. I vaguely remember that Python recommends using importlib instead, but I'm not sure. At any rate, I sometimes have CI problems that seem to be caused by pkg_resources in the phasespace package:
https://github.com/ComPWA/compwa-org/runs/6573750238?check_suite_focus=true#step:5:275

Travis failing

Is this due to git-lfs? See #25 for example.

Setup Travis

@mayou36 can you do it?

Tag a new release now dependencies have been updated

I'll then push my recipe to conda-forge.

High level interface

@mayou36 We discussed yesterday and I already forgot what we wanted to do. Let me write down what I remember:

We have a generate function that accepts
- n as int or tf.Variable. Internally, it converts n to tf.Variable if it's an int.
- Has a lazy optional parameter.
- Has a chunk_size optional parameter.
  This is the highest level function, and the one that users will usually make use of.
We have a generate_tensor (let's improve the name) that explicitly requires the tf.Variable and always returns a tensor. Internally, generate calls generate_tensor and does the loop.

Is this it?

Fix coveralls or switch to codecov

CI currently fails because pytest coverage reports are currently not uploaded to Coveralls, see e.g. here. See also #43 (review)

I gave the coveralls-github action a try, but this seems not to work well for Python (it requires lcov files). So if client error with coveralls can't be solved, we may have to move to codecov.

Generation is quite slow

In my laptop:

[21:03]~/Arxiu/Fisica/LHCb/Projectes/zfit/tfphasespace[master](zfit-qokI31rn)$ python3 benchmark/bench_tfphasespace.py
2019-02-27 21:03:49.356016: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Elapsed time: 4126.743583000007 ms
[21:04]~/Arxiu/Fisica/LHCb/Projectes/zfit/tfphasespace[master](zfit-qokI31rn)$ time root -q benchmark/bench_tgenphasespace.cxx+

Processing benchmark/bench_tgenphasespace.cxx+...
(int) 0
root -l -q benchmark/bench_tgenphasespace.cxx+  0.24s user 0.12s system 101% cpu 0.344 total

I've factored out the import time of tensorflow, but still, not great. Also, generating large (1e9) samples is basically impossible, memory explodes (I understand that, though, benchmark/bench_tgenphasespace.cx doesn't save any info).

What can we do about it?