pyhf / pyhf-tutorial Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 4.0 657 KB

Updating tutorial on pyhf

Home Page: https://pyhf.github.io/pyhf-tutorial/

License: Apache License 2.0

Makefile 0.03% Jupyter Notebook 99.00% Shell 0.38% Python 0.60%

pyhf

pyhf-tutorial's People

Contributors

Stargazers

Watchers

Forkers

paririeck spam-izra peterridolfi pyhf

pyhf-tutorial's Issues

Binder link missing

The launch button seems to be missing for binder. Not sure why, config seems right on a first look-through.

Pseudo-experiments notebook is giving `nan`s

Following PR #61, in the Playing with Toys notebook, in the Hypothesis Testing (low stats) section using the toybased calculator is yielding nans in the calculation:

CLs_obs, CLs_exp = pyhf.infer.hypotest(
    1.0,  # null hypothesis
    [5.0, 7.0] + model.config.auxdata,
    model,
    test_stat="qtilde",
    return_expected_set=True,
    calctype="toybased",
    ntoys=1000,
    track_progress=False,
)
print(f"      Observed CLs: {CLs_obs:.4f}")
for expected_value, n_sigma in zip(CLs_exp, np.arange(-2, 3)):
    print(f"Expected CLs({n_sigma:2d} σ): {expected_value:.4f}")

/usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_optimize.py:353: RuntimeWarning: Values in x were outside bounds during a minimize step, clipping to bounds
  warnings.warn("Values in x were outside bounds during a "

      Observed CLs: nan
Expected CLs(-2 σ): 0.0000
Expected CLs(-1 σ): 0.0452
Expected CLs( 0 σ): 0.1756
Expected CLs( 1 σ): 0.4375
Expected CLs( 2 σ): 1.0000

/usr/local/venv/lib/python3.10/site-packages/pyhf/infer/calculators.py:864: RuntimeWarning: invalid value encountered in divide
  CLs = tensorlib.astensor(CLsb / CLb)

Note Belle II tutorials and add subsections for different experiments

This is a reminder note that Belle II is now giving pyhf tutorials (:exploding_head:) as well. At the 2023 Belle II Summer Workshop Slavomira Stefkova gave a talk on systematics with pyhf and @lorenzennio gave a Belle II focused pyhf tutotrial.

In conversation with @kratsg we agreed that it would be interesting to learn from these tutorials and see if there are components we can pull into the pyhf user guide / tutorial and if there are experiment specific subsections that we could start to add as well.

Model-independent Upper Limits section of Upper Limits notebook failing to execute in CI

I'm not sure why, but the Upper Limits notebook that was added in PR #57 is not running all the way through in the CI that generates the Jupyter Book webpage and is failing at the Model-independent Upper Limits section with a KeyboardInterrupt:

pyhf.set_backend("numpy", "scipy")

mu_tests = np.linspace(15, 25, 10)
(
    obs_limit,
    exp_limit_and_bands,
    (poi_tests, tests),
) = pyhf.infer.intervals.upper_limits.upper_limit(
    data, model, scan=mu_tests, level=0.05, return_results=True
)

Full error output:

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[11], line 8
      1 pyhf.set_backend("numpy", "scipy")
      3 mu_tests = np.linspace(15, 25, 10)
      4 (
      5     obs_limit,
      6     exp_limit_and_bands,
      7     (poi_tests, tests),
----> 8 ) = pyhf.infer.intervals.upper_limits.upper_limit(
      9     data, model, scan=mu_tests, level=0.05, return_results=True
     10 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/intervals/upper_limits.py:252, in upper_limit(data, model, scan, level, return_results, **hypotest_kwargs)
    210 """
    211 Calculate an upper limit interval ``(0, poi_up)`` for a single Parameter of
    212 Interest (POI) using root-finding or a linear scan through POI-space.
   (...)
    249 .. versionadded:: 0.7.0
    250 """
    251 if scan is not None:
--> 252     return linear_grid_scan(
    253         data, model, scan, level, return_results, **hypotest_kwargs
    254     )
    255 # else:
    256 bounds = model.config.suggested_bounds()[
    257     model.config.par_slice(model.config.poi_name).start
    258 ]

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/intervals/upper_limits.py:189, in linear_grid_scan(data, model, scan, level, return_results, **hypotest_kwargs)
    146 """
    147 Calculate an upper limit interval ``(0, poi_up)`` for a single
    148 Parameter of Interest (POI) using a linear scan through POI-space.
   (...)
    186 .. versionadded:: 0.7.0
    187 """
    188 tb, _ = get_backend()
--> 189 results = [
    190     hypotest(mu, data, model, return_expected_set=True, **hypotest_kwargs)
    191     for mu in scan
    192 ]
    193 obs = tb.astensor([[r[0]] for r in results])
    194 exp = tb.astensor([[r[1][idx] for idx in range(5)] for r in results])

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/intervals/upper_limits.py:190, in <listcomp>(.0)
    146 """
    147 Calculate an upper limit interval ``(0, poi_up)`` for a single
    148 Parameter of Interest (POI) using a linear scan through POI-space.
   (...)
    186 .. versionadded:: 0.7.0
    187 """
    188 tb, _ = get_backend()
    189 results = [
--> 190     hypotest(mu, data, model, return_expected_set=True, **hypotest_kwargs)
    191     for mu in scan
    192 ]
    193 obs = tb.astensor([[r[0]] for r in results])
    194 exp = tb.astensor([[r[1][idx] for idx in range(5)] for r in results])

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/__init__.py:169, in hypotest(poi_test, data, pdf, init_pars, par_bounds, fixed_params, calctype, return_tail_probs, return_expected, return_expected_set, return_calculator, **kwargs)
    157 _check_hypotest_prerequisites(pdf, data, init_pars, par_bounds, fixed_params)
    159 calc = utils.create_calculator(
    160     calctype,
    161     data,
   (...)
    166     **kwargs,
    167 )
--> 169 teststat = calc.teststatistic(poi_test)
    170 sig_plus_bkg_distribution, bkg_only_distribution = calc.distributions(poi_test)
    172 tb, _ = get_backend()

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/calculators.py:389, in AsymptoticCalculator.teststatistic(self, poi_test)
    378 asimov_mu = 1.0 if self.test_stat == 'q0' else 0.0
    380 asimov_data, asimov_mubhathat = generate_asimov_data(
    381     asimov_mu,
    382     self.data,
   (...)
    387     return_fitted_pars=True,
    388 )
--> 389 qmuA_v, (mubhathat_A, muhatbhat_A) = teststat_func(
    390     poi_test,
    391     asimov_data,
    392     self.pdf,
    393     self.init_pars,
    394     self.par_bounds,
    395     self.fixed_params,
    396     return_fitted_pars=True,
    397 )
    398 self.sqrtqmuA_v = tensorlib.sqrt(qmuA_v)
    399 self.fitted_pars = HypoTestFitResults(
    400     asimov_pars=asimov_mubhathat,
    401     free_fit_to_data=muhatbhat,
   (...)
    404     fixed_poi_fit_to_asimov=mubhathat_A,
    405 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/test_statistics.py:234, in qmu_tilde(mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars)
    229 if par_bounds[pdf.config.poi_index][0] != 0:
    230     log.warning(
    231         'qmu_tilde test statistic used for fit configuration with POI not bounded at zero.\n'
    232         + 'Use the qmu test statistic (pyhf.infer.test_statistics.qmu) instead.'
    233     )
--> 234 return _qmu_like(
    235     mu,
    236     data,
    237     pdf,
    238     init_pars,
    239     par_bounds,
    240     fixed_params,
    241     return_fitted_pars=return_fitted_pars,
    242 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/test_statistics.py:27, in _qmu_like(mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars)
     19 """
     20 Clipped version of _tmu_like where the returned test statistic
     21 is 0 if muhat > 0 else tmu_like_stat.
   (...)
     24 qmu_tilde. Otherwise this is qmu (no tilde).
     25 """
     26 tensorlib, optimizer = get_backend()
---> 27 tmu_like_stat, (mubhathat, muhatbhat) = _tmu_like(
     28     mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars=True
     29 )
     30 qmu_like_stat = tensorlib.where(
     31     muhatbhat[pdf.config.poi_index] > mu, tensorlib.astensor(0.0), tmu_like_stat
     32 )
     33 if return_fitted_pars:

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/test_statistics.py:51, in _tmu_like(mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_pars)
     47 tensorlib, optimizer = get_backend()
     48 mubhathat, fixed_poi_fit_lhood_val = fixed_poi_fit(
     49     mu, data, pdf, init_pars, par_bounds, fixed_params, return_fitted_val=True
     50 )
---> 51 muhatbhat, unconstrained_fit_lhood_val = fit(
     52     data, pdf, init_pars, par_bounds, fixed_params, return_fitted_val=True
     53 )
     54 log_likelihood_ratio = fixed_poi_fit_lhood_val - unconstrained_fit_lhood_val
     55 tmu_like_stat = tensorlib.astensor(
     56     tensorlib.clip(log_likelihood_ratio, 0.0, max_value=None)
     57 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/mle.py:131, in fit(data, pdf, init_pars, par_bounds, fixed_params, **kwargs)
    124 # get fixed vals from the model
    125 fixed_vals = [
    126     (index, init)
    127     for index, (init, is_fixed) in enumerate(zip(init_pars, fixed_params))
    128     if is_fixed
    129 ]
--> 131 return opt.minimize(
    132     twice_nll, data, pdf, init_pars, par_bounds, fixed_vals, **kwargs
    133 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/optimize/mixins.py:193, in OptimizerMixin.minimize(self, objective, data, pdf, init_pars, par_bounds, fixed_vals, return_fitted_val, return_result_obj, return_uncertainties, return_correlations, do_grad, do_stitch, **kwargs)
    190         par_names[index] = None
    191     par_names = [name for name in par_names if name]
--> 193 result = self._internal_minimize(
    194     **minimizer_kwargs, options=kwargs, par_names=par_names
    195 )
    196 result = self._internal_postprocess(
    197     result, stitch_pars, return_uncertainties=return_uncertainties
    198 )
    200 _returns = [result.x]

File /usr/local/venv/lib/python3.10/site-packages/pyhf/optimize/mixins.py:52, in OptimizerMixin._internal_minimize(self, func, x0, do_grad, bounds, fixed_vals, options, par_names)
     34 def _internal_minimize(
     35     self,
     36     func,
   (...)
     42     par_names=None,
     43 ):
     44     minimizer = self._get_minimizer(
     45         func,
     46         x0,
   (...)
     50         par_names=par_names,
     51     )
---> 52     result = self._minimize(
     53         minimizer,
     54         func,
     55         x0,
     56         do_grad=do_grad,
     57         bounds=bounds,
     58         fixed_vals=fixed_vals,
     59         options=options,
     60     )
     62     try:
     63         assert result.success

File /usr/local/venv/lib/python3.10/site-packages/pyhf/optimize/opt_scipy.py:93, in scipy_optimizer._minimize(self, minimizer, func, x0, do_grad, bounds, fixed_vals, options)
     90 else:
     91     constraints = []
---> 93 return minimizer(
     94     func,
     95     x0,
     96     method=method,
     97     jac=do_grad,
     98     bounds=bounds,
     99     constraints=constraints,
    100     tol=tolerance,
    101     options=dict(maxiter=maxiter, disp=bool(verbose), **solver_options),
    102 )

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_minimize.py:705, in minimize(fun, x0, args, method, jac, hess, hessp, bounds, constraints, tol, callback, options)
    702     res = _minimize_cobyla(fun, x0, args, constraints, callback=callback,
    703                             **options)
    704 elif meth == 'slsqp':
--> 705     res = _minimize_slsqp(fun, x0, args, jac, bounds,
    706                           constraints, callback=callback, **options)
    707 elif meth == 'trust-constr':
    708     res = _minimize_trustregion_constr(fun, x0, args, jac, hess, hessp,
    709                                        bounds, constraints,
    710                                        callback=callback, **options)

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_slsqp_py.py:432, in _minimize_slsqp(func, x0, args, jac, bounds, constraints, maxiter, ftol, iprint, disp, eps, callback, finite_diff_rel_step, **unknown_options)
    429     c = _eval_constraint(x, cons)
    431 if mode == -1:  # gradient evaluation required
--> 432     g = append(wrapped_grad(x), 0.0)
    433     a = _eval_con_normals(x, cons, la, n, m, meq, mieq)
    435 if majiter > majiter_prev:
    436     # call callback if major iteration has incremented

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_optimize.py:346, in _clip_x_for_func.<locals>.eval(x)
    344 def eval(x):
    345     x = _check_clip_x(x, bounds)
--> 346     return func(x)

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py:273, in ScalarFunction.grad(self, x)
    271 if not np.array_equal(x, self.x):
    272     self._update_x_impl(x)
--> 273 self._update_grad()
    274 return self.g

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py:256, in ScalarFunction._update_grad(self)
    254 def _update_grad(self):
    255     if not self.g_updated:
--> 256         self._update_grad_impl()
    257         self.g_updated = True

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py:173, in ScalarFunction.__init__.<locals>.update_grad()
    171 self._update_fun()
    172 self.ngev += 1
--> 173 self.g = approx_derivative(fun_wrapped, self.x, f0=self.f,
    174                            **finite_diff_options)

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_numdiff.py:505, in approx_derivative(fun, x0, method, rel_step, abs_step, f0, bounds, sparsity, as_linear_operator, args, kwargs)
    502     use_one_sided = False
    504 if sparsity is None:
--> 505     return _dense_difference(fun_wrapped, x0, f0, h,
    506                              use_one_sided, method)
    507 else:
    508     if not issparse(sparsity) and len(sparsity) == 2:

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_numdiff.py:576, in _dense_difference(fun, x0, f0, h, use_one_sided, method)
    574     x = x0 + h_vecs[i]
    575     dx = x[i] - x0[i]  # Recompute dx as exactly representable number.
--> 576     df = fun(x) - f0
    577 elif method == '3-point' and use_one_sided[i]:
    578     x1 = x0 + h_vecs[i]

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_numdiff.py:456, in approx_derivative.<locals>.fun_wrapped(x)
    455 def fun_wrapped(x):
--> 456     f = np.atleast_1d(fun(x, *args, **kwargs))
    457     if f.ndim > 1:
    458         raise RuntimeError("`fun` return value has "
    459                            "more than 1 dimension.")

File /usr/local/venv/lib/python3.10/site-packages/scipy/optimize/_differentiable_functions.py:137, in ScalarFunction.__init__.<locals>.fun_wrapped(x)
    133 self.nfev += 1
    134 # Send a copy because the user may overwrite it.
    135 # Overwriting results in undefined behaviour because
    136 # fun(self.x) will change self.x, with the two no longer linked.
--> 137 fx = fun(np.copy(x), *args)
    138 # Make sure the function returns a true scalar
    139 if not np.isscalar(fx):

File /usr/local/venv/lib/python3.10/site-packages/pyhf/optimize/opt_numpy.py:30, in wrap_objective.<locals>.func(pars)
     28 pars = tensorlib.astensor(pars)
     29 constrained_pars = stitch_pars(pars)
---> 30 return objective(constrained_pars, data, pdf)[0]

File /usr/local/venv/lib/python3.10/site-packages/pyhf/infer/mle.py:53, in twice_nll(pars, data, pdf)
     12 def twice_nll(pars, data, pdf):
     13     r"""
     14     Two times the negative log-likelihood of the model parameters, :math:`\left(\mu, \boldsymbol{\theta}\right)`, given the observed data.
     15     It is used in the calculation of the test statistic, :math:`t_{\mu}`, as defined in Equation (8) in :xref:`arXiv:1007.1727`
   (...)
     51         Tensor: Two times the negative log-likelihood, :math:`-2\ln L\left(\mu, \boldsymbol{\theta}\right)`
     52     """
---> 53     return -2 * pdf.logpdf(pars, data)

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:945, in Model.logpdf(self, pars, data)
    938 if data.shape[-1] != self.nominal_rates.shape[-1] + len(
    939     self.config.auxdata
    940 ):
    941     raise exceptions.InvalidPdfData(
    942         f'eval failed as data has len {data.shape[-1]} but {self.config.nmaindata + self.config.nauxdata} was expected'
    943     )
--> 945 result = self.make_pdf(pars).log_prob(data)
    947 if (
    948     not self.batch_size
    949 ):  # force to be not scalar, should we changed with #522
    950     return tensorlib.reshape(result, (1,))

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:907, in Model.make_pdf(self, pars)
    896 """
    897 Construct a pdf object for a given set of parameter values.
    898 
   (...)
    904 
    905 """
    906 pdfobjs = []
--> 907 mainpdf = self.main_model.make_pdf(pars)
    908 if mainpdf:
    909     pdfobjs.append(mainpdf)

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:636, in _MainModel.make_pdf(self, pars)
    635 def make_pdf(self, pars):
--> 636     lambdas_data = self.expected_data(pars)
    637     return prob.Independent(prob.Poisson(lambdas_data))

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:698, in _MainModel.expected_data(self, pars, return_by_sample)
    696 tensorlib, _ = get_backend()
    697 pars = tensorlib.astensor(pars)
--> 698 deltas, factors = self._modifications(pars)
    700 allsum = tensorlib.concatenate(deltas + [self.nominal_rates])
    702 nom_plus_delta = tensorlib.sum(allsum, axis=0)

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:657, in _MainModel._modifications(self, pars)
    653 def _modifications(self, pars):
    654     deltas = list(
    655         filter(
    656             lambda x: x is not None,
--> 657             [self.modifiers_appliers[k].apply(pars) for k in self._delta_mods],
    658         )
    659     )
    660     factors = list(
    661         filter(
    662             lambda x: x is not None,
    663             [self.modifiers_appliers[k].apply(pars) for k in self._factor_mods],
    664         )
    665     )
    667     return deltas, factors

File /usr/local/venv/lib/python3.10/site-packages/pyhf/pdf.py:657, in <listcomp>(.0)
    653 def _modifications(self, pars):
    654     deltas = list(
    655         filter(
    656             lambda x: x is not None,
--> 657             [self.modifiers_appliers[k].apply(pars) for k in self._delta_mods],
    658         )
    659     )
    660     factors = list(
    661         filter(
    662             lambda x: x is not None,
    663             [self.modifiers_appliers[k].apply(pars) for k in self._factor_mods],
    664         )
    665     )
    667     return deltas, factors

File /usr/local/venv/lib/python3.10/site-packages/pyhf/modifiers/histosys.py:169, in histosys_combined.apply(self, pars)
    166 else:
    167     histosys_alphaset = self.param_viewer.get(pars)
--> 169 results_histo = self.interpolator(histosys_alphaset)
    170 # either rely on numerical no-op or force with line below
    171 results_histo = tensorlib.where(
    172     self.histosys_mask, results_histo, self.histosys_default
    173 )

File /usr/local/venv/lib/python3.10/site-packages/pyhf/interpolators/code4p.py:85, in code4p.__call__(self, alphasets)
     80 alphas_times_deltas_dn = tensorlib.einsum(
     81     'sa,shb->shab', alphasets, self.deltas_dn
     82 )
     84 # for |a| < 1
---> 85 asquare = tensorlib.power(alphasets, 2)
     86 tmp1 = asquare * 3.0 - 10.0
     87 tmp2 = asquare * tmp1 + 15.0

File /usr/local/venv/lib/python3.10/site-packages/pyhf/tensor/numpy_backend.py:287, in numpy_backend.power(self, tensor_in_1, tensor_in_2)
    286 def power(self, tensor_in_1: Tensor[T], tensor_in_2: Tensor[T]) -> ArrayLike:
--> 287     return np.power(tensor_in_1, tensor_in_2)

KeyboardInterrupt:

Something is going wrong in the build, and this should get diagnosed. It might be some timeout issue as this cell takes the longest to run.

Tutorial on batching with pyhf

Came up as part of #76 which is a good way to motivate the usage of batched models.

Add example of what the expected and observed CLs values actually are

Given @sabinekraml's good GitHub Discussion question (that we've seen before in Issues) scikit-hep/pyhf#1619, along with scikit-hep/pyhf#1620 we should make this more explicit by adding a notebook that demonstrates the differences between the expected and observed CLs values given

If you switch from observed data to expected data — your background-only model will have NPs centered at different values because they're fit to different datasets. The expected CLs has to be different.

cc @kratsg @lukasheinrich

ReadError: not a gzip file and other exceptions

Hi! I saw that there are a bunch of errors thrown in the current docs:

Read error at: https://pyhf.github.io/pyhf-tutorial/SerializationAndPatching.html
Keyboard interrupt at https://pyhf.github.io/pyhf-tutorial/Toys.html
InvalidWorkspaceOperation at https://pyhf.github.io/pyhf-tutorial/Combinations.html

Use pip-tools lock files to create consistent install environments

To avoid problems with updates to jupyter-book breaking the environment (c.f. PR #35, PR #36, PR #37) use pip-tools to produce lock files and install from those for deployment.

Small bugs in 'SimpleWorkspace.ipynb' and 'Combinations.ipynb'

Hi!

While clicking through the tutorials with pyhf 0.7.0. I noticed that in "SimpleWorkspace", the cell containing

workspace.get_measurement(poi_name="mu")

failed because the signature seems to have changed: get_measurement() got an unexpected keyword argument 'poi_name'

Similarly, in "Combinations" Cell 5 fails as the combined_workspace object doesn't seem to have a parameters attribute.
I'm not entirely sure about this second one though as it also fails with an AttributeError here: https://pyhf.github.io/pyhf-tutorial/Combinations.html
Is this intended behavior?

I know these are very small things, however as a new user any unexpected behavior is a bit confusing.

Cheers,
Moritz

Doc rendering of toys tutorial is broken

Hi all, looking through the tutorial page I've noticed that a tutorial was broken, namely https://pyhf.github.io/pyhf-tutorial/Toys.html#hypothesis-testing-revisited (if you scroll down to the very end of the output)

I assume it's a minor thing that you will be able to immediately fix, so I didn't dig into the error itself

Use Docker to build lock file

Given problems in PR #57 of building a pip-compile based lock file targeting Linux x86_64 when on a Apple silicon Mac, it is probably preferable to switch to building a lock file using a python:3.11 Docker container that can then force the environment to be x86_64, even if emulation is needed.

The workflow should also having @matthewfeickert's personal path information in the lock file, like it is now:

pyhf-tutorial/book/requirements.lock

Line 5 in f913d9d

    
           #    pip-compile --generate-hashes --output-file=/home/feickert/Code/GitHub/pyhf-org/pyhf-tutorial/book/requirements.lock /home/feickert/Code/GitHub/pyhf-org/pyhf-tutorial/requirements.txt

as this injects noise in the diffs if anyone else builds it.

Allow for deployment on workflow dispatch

As brought up by @klieret in Issue #18, changing

pyhf-tutorial/.github/workflows/deploy-jupyter-book.yml

Lines 38 to 39 in 1821e1a

    
               - name: Deploy Jupyter book to GitHub pages 
        
                 if: success() && github.event_name == 'push' && github.ref == 'refs/heads/main'

to deploy on github.event_name != 'pull_request' would allow for deployment on workflow_dispatch.

Originally posted by @matthewfeickert in #18 (comment)

SUSY Limit Plotting

This issue describes the way of presenting SUSY limits as it has been agreed between ATLAS and CMS. An example from JHEP 12 (2019) 060 can be seen here:

Description of limit lines

Observed limit (thick solid dark-red line): all uncertainties are included in the fit as nuisance parameters, with the exception of the theoretical signal uncertainties (PDF, scales, etc.) on the inclusive cross section.
±1σ lines around observed limit (1) with style "thin dark-red dotted": re-run limit calculation (1) while increasing or decreasing the signal cross section by the theoretical signal uncertainties (PDF, scales, etc.).
Expected limit (less thick long-dashed dark-blue line): all uncertainties are included in the fit as nuisance parameters, with the exception of the theoretical signal uncertainties (PDF, scales, etc.) on the inclusive cross section.
±1σ band around expected limit (2) with style "yellow band": the band contours are the ±1σ results of the fit (2).

Implementation

Below an implementation of the calculation of the above CLs value using pyhf and cabinetry is presented.

Workspace

The workspace is created by a cabinetry configuration file. Is should be noted that this workspace includes multiple signal models, i.e. multiple signal Samples with different NormFactors each. An example of such configuration file can be seen here: config_excl_AnalysisChallenge_grid.yml.

Code snippet

# get model and observed data (pre-fit)
model, data = cabinetry.model_utils.model_and_data(ws)
channels = model.config.channels

# workspace and model without signal theory uncertainties. signal theory uncertainties have the string '_Signal_' in their name.
signal_theory_names = [systematic['Name'] for systematic in cabinetry_config['Systematics'] if '_Signal_' in systematic['Name']]
pruned_ws = pyhf.Workspace(ws)._prune_and_rename(prune_modifiers=signal_theory_names)
pruned_model, _ = cabinetry.model_utils.model_and_data(pruned_ws)

# signal normalisation parameter names (having the string 'Signal_norm' in their name).
signal_norms = [param for param in model.config.par_names if 'Signal_norm' in param]
# signal normalisation parameters indices for both full and pruned models
signal_norms_idx = [model.config.par_map[norm]['slice'].start for norm in signal_norms]
pruned_signal_norms_idx = [pruned_model.config.par_map[norm]['slice'].start for norm in signal_norms]
# signal theory modifier indices
signal_theory_idx = [model.config.par_map[modifier]['slice'].start for modifier in signal_theory_names]

def fix_and_calculate_CLs(param):
    '''
    Fix all signal normalisation parameters to zero except the one specified by param
    then calculate expected and observed CLs values
    '''

    # Expected CLs
    print(f"Calculating explected CLs for {param}")

    # get the index of the param, this will be the new POI
    pruned_poi_index = cabinetry.model_utils._poi_index(pruned_model, poi_name=param)
    # set POI to param
    pruned_model.config.set_poi(pruned_model.config.par_names[pruned_poi_index])

    # indices of all the signal normalisation parameters except the poi_idx
    pruned_fixed_signal_norms_idx = [idx for idx in pruned_signal_norms_idx if idx!=pruned_poi_index]

    # set the rest of signal normalisation parameters fixed at zero
    pruned_fix_pars = pruned_model.config.suggested_fixed().copy()
    pruned_init_pars = pruned_model.config.suggested_init().copy()
    for idx in pruned_fixed_signal_norms_idx:
        pruned_fix_pars[idx] = True
        pruned_init_pars[idx] = 0.0
    
    # calculate expected CLs with respect to POI
    try:
        _, CLs_exp_band = pyhf.infer.hypotest(1.0, asimov_dataset, pruned_model, init_pars=pruned_init_pars, fixed_params=pruned_fix_pars, return_expected_set=True)
        # keep only ±1 sigma and convert to simple floats
        CLs_exp_band = list(map(lambda x: float(x), CLs_exp_band[1:4]))
    except:
        CLs_exp_band = None
    
    # Observed CLs
    print(f"Calculating observed CLs for {param}")

    # get the index of the param, this will be the new POI
    poi_index = cabinetry.model_utils._poi_index(model, poi_name=param)
    # set POI to param
    model.config.set_poi(model.config.par_names[poi_index])
    
    # indices of all the signal normalisation parameters except the poi_idx
    fixed_signal_norms_idx = [idx for idx in signal_norms_idx if idx!=poi_index]

    # set the rest of signal normalisation parameters fixed at zero
    fix_pars = model.config.suggested_fixed().copy()
    init_pars = model.config.suggested_init().copy()
    for idx in fixed_signal_norms_idx:
        fix_pars[idx] = True
        init_pars[idx] = 0.0

    CLs_obs_band = []
    # run three fits with different fixed signal theory NPs
    for signal_theory_np in [-1.0, 0.0, 1.0]:
        # fix signal theory NPs
        for idx in signal_theory_idx:
            fix_pars[idx] = True
            init_pars[idx] = signal_theory_np

        # calculate observed CLs with respect to POI
        try:
            CLs_obs = pyhf.infer.hypotest(1.0, data, model, init_pars=init_pars, fixed_params=fix_pars)
            # convert to simple floats
            CLs_obs = float(CLs_obs)
        except:
            CLs_obs = None

        # save results
        CLs_obs_band.append(CLs_obs)

    return [CLs_obs_band, CLs_exp_band]

# results dictionary
post_fit_significance_results = {}

# loop over the signal parameters of the workspace and calculate
for param in signal_norms:
    result = fix_and_calculate_CLs[param]
    post_fit_significance_results[param] = result

Revise upper limits notebook to address statistical descriptions

Expected Events

References:

Statistical Error Propagation (J. Phys. Chem. A 2001, 105, 15, 3917–3921)

root-project/root#13404

Is there a reason that these references are chosen? Is there some HEP reference paper on stats that cites the J. Phys. Chem. A 2001 paper? If not, let's check the PDG to see if this is included there, unless there was something particularly striking about this reference.

Discovery $p$-values

From Equation 52 in arXiv:1503.07622:

Maybe I need to read this section more carefully, but how does Equation 52 show the following statements? Regardless of how it does, can we make this more explicit here?

$p_{\mu}$ (and $\mathrm{CL}_b$)

$\mathrm{CL}_\mathrm{b}$ provides a measure of compatibility of the observed data with the 95% CL signal strength hypothesis relative to fluctuations of the background

This is mixing multiple ideas of an upper limit signal strength and the definition of $\mathrm{CL}_b$. $\mathrm{CL}_b$ is just a statement of the $p$-value of the background only hypothesis given the observed test-statistic, and then flipped with the $1-p_b$ part for geometric reasons for use in the ratio that gives $\mathrm{CL}_s$ — it is a construction that exists only by necessity for the $\mathrm{CL}_s$ method. The $\mathrm{CL}_b$ makes no statement on what is the observed test statistic that you use. Here we are choosing the test stat for the 95% CL upper limit but that is just a choice as we wanted a test size of $\alpha=0.05$, but it could have been anything. So I we should not include the test size in the definition (3 line equality) of $\mathrm{CL}_b$.

See the paragraph above plot on page two in this document.

Is there another reference that we can use? I know that this one comes up in search results because it is the only bloody thing that isn't firewalled by ATLAS, but I've never been a huge fan of this note and it kinda bugs me that it doesn't have any stable reference. It isn't even on CDS (private to ATLAS) as far as I know, but I would love to be wrong on this!

$p_0$

$p(s)=0$ measures compatibility of the observed data with the background-only (zero signal strength) hypothesis relative to fluctuations of the background.

$p$-values do not measure compatibility of a model and data. I know that the caption for Table 20 literally says this, but it is wrong. $p$-values are the probability given a model hypothesis to have observed a test statistic as extreme or more so (extreme in terms of disagreement with the model hypothesis) than the observed test statistic. They can not make statements on model compatibility with the data. So I would suggest we rephrase this section.

Making the table

It would probably be useful to mention that DRInt_cuts should correspond to Signal Region "DR-Int-EWK" and comment on why the results we get here are slightly different form the published table. This is maybe beating the point over the head, but I think it would help (would help 02:00 Matthew!).

Originally posted by @matthewfeickert in #57 (review)

In PR #57 there was some disagreement on the accuracy of different statistical descriptions that had been provided as template statements by ATLAS. This is not an ATLAS document so the descriptions of the statistical procedure should be arrived at independently .

Link not working: Issue on page /HelloWorld.html

Hi,

In the following sentence:

Nicely enough, pyhf is smart enough to let us know that, if for a model where 
, we’re not going to be able to catch it because our default bounds were bounding the parameter of interest from 
below at 0. For more information, see Equation 14 in [[10.7.2727]](https://arxiv.org/abs/10.7.2727).

The link to the arxiv paper does not work.
Could you please fix it?

Thanks
Aman

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	- name: Deploy Jupyter book to GitHub pages
	if: success() && github.event_name == 'push' && github.ref == 'refs/heads/main'