jelleaalbers / blueice Goto Github PK

View Code? Open in Web Editor NEW

9.0 9.0 9.0 3.07 MB

Build Likelihoods Using Efficient Interpolations and monte-Carlo generated Events

License: BSD 3-Clause "New" or "Revised" License

Python 44.93% Jupyter Notebook 55.07%

blueice's People

Contributors

Stargazers

Watchers

Forkers

ershockley l-althueser adambrown1 petergaemers jingqiangye dvelasc5 shenyangshi hammannr dachengx

blueice's Issues

rebining to large bins losses events.

When h.rebin(1,.5) even to an integer fraction of the bins there seems to be a loss of event while you expect to get the summation of the bins to larger bins.

Make Model picklable, or otherwise support remote parallel computation

At the moment Model doesn't pickle, so the models that are computed in parallel during LogLikelihood.prepare() can't actually be returned. It still works because the computed PDFs end up in the PDF cache. We can them load them in serially in the main process much faster... as long as the main process and the ipyparallel engines share the same pdf_cache location. If not, e.g. if the engines are actually on a different machine, parallel computation doesn't work yet.

Rate parameter malfunctions when there are also shape parameters

The implementation for rate parameters assumes the fraction of events that fall in the analysis space is the same for every model. This is not true when there are shape parameters.

A related problem exists for rate uncertainties (rate parameters with a prior). The prior is currently defined over the absolute rate, but this is affected by shape parameters too.

I'm going to switch rate parameters to be multipliers of the rate get out of this mess. This will mean a few examples in laidbax have to be changed.

Minuit integration

It would be nice if there was a bestfit_minuit just like bestfit_scipy. Physicists like minuit.

Pass more settings to minuit in bestfit_minuit

According to https://github.com/JelleAalbers/blueice/blob/master/blueice/inference.py#L206, our minuit implementation does not yet account for bounds. That's probably OK, since the likelihood function just gives -float('inf') outside physical bounds, but maybe minuit would appreciate (and perhaps be more reliable) if we tell it our bounds more gently.

If we're planning to use or report the errors from minuit at some point, we should also set the error_def parameter to 0.5. According to https://nbviewer.jupyter.org/github/iminuit/iminuit/blob/master/tutorial/basic_tutorial.ipynb this sets up the right magic for negative log likelihood functions.

Travis builds failing for unknown reason

Currently our travis builds are failing with the very unhelpful "core dumped" message. I first thought it was related to iminuit, but removing all iminuit installation instructions hasn't helped. I hope this is a temporary problem on the travis CI side, otherwise we may have to use stuff like https://github.com/springmeyer/travis-coredump to diagnose it.

Validate arguments to likelihood function

Currently LogLikelihood.__call__ accepts **kwargs, and searches it for arguments corresponding to known rate and shape parameters. Thus, if you mistype a parameter, it will be silently ignored, and it appears the parameter does nothing. It would be better if you get an error when this happens instead.

Caching encounters race conditions during parallell jobs

When starting multiple batch jobs on Midway, it is necessary to run a "burn-in" run beforehand to avoid multiple jobs attempting to write to a cache file, corrupting it. Deleting and re-run is then required.

Add derived quantities

In some analyses, the analysis may be performed in a space that partially depends on nuisance parameters- e.g. reconstructed energy. Source and model classes could be modified to account for this, for example by interpolating the derived quantities between anchor points.

Test builds no longer work, need to migrate to github-actions

Travis has shut down its travis-ci.org site and no longer offers free builds for open-source projects. We should migrate to github actions to maintain continuous integration testing if people still want to develop blueice.

Move shape parameter grids, interpolation to be per source

Currently, blueice instantiates a model grid in the anchor point grid spanned by the shape parameter variations from all sources. This extends to the pdf interpolation.

The pdfs are cached, and only loaded into memory once, so that the second problem poses a larger challenge than the first.

Proposed steps:

Create a wrapper "interpolator" that contains (and passes relevant parameters to) per-source interpolators in only the relevant shape directions for each
Consider modifying or removing the model layer, possible solutions:
2.1) Remove model layer completely, letting the likelihood call a collection of sources (possibly with a source_collection class that provides access to the sources.(similar to 2.2)
2.2) Change the model to be per physical model-- instantiate one model for ER, one for NR etc. and change "source" to be anchor_source or model_point
2.3) maintain current structure, while moving most of the functionality (interpolation etc) to be per-source.

Binned likelihoods, pdfs

I am looking at adding functionality for binned likelihoods in blueice (the first wimp analysis will probably be binned, with bins derived from calibration data)
As the statistics available greatly over the analysis range, I believe bin size and shape should be as flexible as possible— my preference would be to use a function of the event data that returns a bin index.

My first idea was to simply re-purpose the XENONSource- add an analysis variable called "bin_index", compute that for each event, and use the index as an analysis variable. However, as both Source and Likelihood classes would need modification, my current approach is to create a BinnedSource, (Model) and Likelihood class inheriting from the un-binned counterparts and re-implementing what is needed. Does this sound sensible?

Add mode where pdfs are recalculated at every parameter combination

Minuit 2 support

Blueice's minuit wrappers assume the minuit 1 API. The minuit 2 API is slightly different, see here. Unfortunately this means blueice's minuit inference methods no longer work in post-2020 setup; we instead get an error about a missing 'print_level' initialization argument. That could be all, or it could be the tip of the iceberg.

Allow settings from source config to be shape parameters

Currently you can only vary parameters of the model as shape uncertainties, not parameters of each individual source (such as its energy distribution).

MC/calibration data statistical uncertainty nuisance parameters

Currently the statistical errors on a PDF for a given combination of parameters are assumed to be negligible. When you make a PDF from MC, you can often get to this happy point if you are patient.. but not when deriving a PDF from data.

It would be nice if there was an option to have a parameter to vary the expectation in each bin in each PDF used, and a corresponding Poisson term in the likelihood -- or at least on such parameter/erm for each bin of the total PDF. However:

Minimizing these guys might be a tough cookie for the minimizer... the LHC folks have some special magic for this (Beeston-Barlow light) that may be worth looking into;
This won't generalize to using KDEs as density estimators, for which we might have to do e.g. bootstrapping of the calibration/MC data;
When you don't have enough counts in a bin you are in trouble anyway (what's the error on a bin in which you see no events? Surely not 0..)

Allow negative rate multipliers

It might be desirable to subtract two pdfs. (say, to avoid double-counting of a contaminant of a calibration source)

Proposed implementation:
-with a suitable flag, allow a source rate_multiplier to be negative
-if such a flag is set, truncate the summed pdf to be above 0.