Code Monkey home page Code Monkey logo

combineharvester's Introduction

CombineHarvester

Full documentation: http://cms-analysis.github.io/CombineHarvester

Quick start

This package requires HiggsAnalysis/CombinedLimit to be in your local CMSSW area. We follow the release recommendations of the combine developers which can be found here. The CombineHarvester framework is compatible with the CMSSW 14_1_X and 11_3_X series releases. The default branch, main, is for developments in the 14_1_X releases, and the current recommended tag is v3.0.0-pre1. The v2.1.0 tag should be used in CMSSW_11_3_X.

A new full release area can be set up and compiled in the following steps:

cmsrel CMSSW_14_1_0_pre4
cd CMSSW_14_1_0_pre4/src
cmsenv
git clone https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
# IMPORTANT: Checkout the recommended tag on the link above
git clone https://github.com/cms-analysis/CombineHarvester.git CombineHarvester
cd CombineHarvester
git checkout v3.0.0-pre1
scram b

Previously this package contained some analysis-specific subpackages. These packages can now be found here. If you would like a repository for your analysis package to be created in that group, please create an issue in the CombineHarvester repository stating the desired package name and your NICE username. Note: you are not obliged to store your analysis package in this central group.

combineharvester's People

Contributors

abdollah110 avatar adavidzh avatar adewit avatar ajgilbert avatar amarini avatar anehrkor avatar anigamova avatar arturakh avatar dabercro avatar danielwinterbottom avatar dennroy avatar dsperka avatar fcolombo avatar greyxray avatar hroskes avatar izaakwn avatar jonathon-langford avatar kpedro88 avatar matz-e avatar nsmith- avatar pieterdavid avatar rcaspart avatar smonig avatar swertz avatar thomas-mueller avatar threiten avatar truggles avatar tstreble avatar veelken avatar vicha-w avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

combineharvester's Issues

How to specify the binning for CLs grids

Current implementation in https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/python/combine/LimitGrids.py builds grids of points from a syntax like 1:10:1, which gives points 1,2,3,4,5.... We should also support a binning-style syntax in the more familiar TH1 style. This should be clearly distinguishable from the current syntax. Something like 9;1;10 giving points 1.5,2.5,3.5,... It would also be nice to have a way of setting the grid automatically to the exact granularity and range of the theory input TH2s.

Need to "mask" channels which go outside of some validity range in a combination

Possible way to do it without modifying combine:

  • After producing the workspace identify the RooAddPdf for the channel that needs to be masked
  • Get a list of its dependent variables
  • Replace each variable with a function that evaluates to zero outside the valid range and the original parameter value inside it
  • Finally can rebuild the full RooSimultaneous replacing the pdfs that were modified

Now when we go outside the range the channel NLL no longer changes as a function of the parameters, just leaves a constant term which shouldn't affect the minimisation at all.

ValidateDatacards.py does cannot read csv inputs.


CMSSW: CMSSW_10_2_13

Higgs combine tool env:

  1. Branch: CMSSW_10_2_X
  2. Tag: v8.2.1 (for csv support)

CombineHarvester branch: master (last commit: 128e41e)


Example data cards and data:

https://gitlab.cern.ch/bff-zprime/exo-datacards/-/tree/EXO-22-006/EXO-22-006

Issue:
While running ValidateDatacards.py datacard.txt I receive this error:

[SetFlag] Changing value of flag "check-negative-bins-on-import" from 1 to 0
[SetFlag] Changing value of flag "workspaces-use-clone" from 0 to 1
Error in <TFile::Init>: 2017_shapes_df_input.csv not a ROOT file
Traceback (most recent call last):
  File "/afs/cern.ch/work/r/rymuelle/public/nanoAODzPrime/higgscombine/CMSSW_10_2_13/bin/slc7_amd64_gcc700/ValidateDatacards.py", line 110, in <module>
    cb.ParseDatacard(args.cards,"","",mass=args.mass)
  File "/afs/cern.ch/work/r/rymuelle/public/nanoAODzPrime/higgscombine/CMSSW_10_2_13/python/CombineHarvester/CombineTools/ch.py", line 21, in ParseDatacard
    return self.__ParseDatacard__(card, analysis, era, channel, bin_id, mass)
RuntimeError: 
*******************************************************************************
Context: Function ch::CombineHarvester::ParseDatacard at 
  /afs/cern.ch/work/r/rymuelle/public/nanoAODzPrime/higgscombine/CMSSW_10_2_13/src/CombineHarvester/CombineTools/src/CombineHarvester_Datacards.cc:131
Problem: Workspace not found in file
*******************************************************************************
Please report issues at
  https://github.com/cms-analysis/CombineHarvester/issues
*******************************************************************************

I appears like it expects a root file. A search through the documentation and github repo did not clearly show a solution.

Deprecate old docs?

It appears full-documentation.md and tutorials-part-2.md are both outdated and redundant with other portions of the documentation. Should we remove them? Same for the tutorial2020 folder.

Conflicts between scripts

Hi,

It seems there are several plot1DScan.py files in the script directories of the various sub-packages... When running scram these get copied to the CMSSW bin folder in a random order and overwrite the "main" script, which is clearly undesirable. What's more, at least one is not an executable file, so it's not even found as a command.

It could be the case also for other scripts, I haven't checked. I can submit a PR to rename them but perhaps you have another preferred solution?

[Question] PostFitShapesFromWorkspace

Could have gotten sth wrong, but PostFitShapesFromWorkspace doesn't seem take into account toys production. "data_obs" is always the original input from workspace, even when running the FitDiagnostics with -t -1. Is this intended behaviour?

Move CombineHarvester into Combine

  • Move CH into Combine
  • Unify datacard parsing used in text2workspace.py and CombineHarvester. Make sure that there is no ambiguous behaviour, i.e. all datacard grammar cases are taken into account.
  • Update documentation.

scram b error

Hi,

I was following the instruction in "Examples Part1" where doing the scram b, I am getting an error like

Compiling /uscms_data/d3/snandan/CMSSW_7_4_7/src/HiggsAnalysis/CombinedLimit/src/RooParametricShapeBinPdf.cc
/uscms_data/d3/snandan/CMSSW_7_4_7/src/HiggsAnalysis/CombinedLimit/src/RooParametricShapeBinPdf.cc: In member function 'virtual Double_t RooParametricShapeBinPdf::analyticalIntegral(Int_t, const char*) const':
/uscms_data/d3/snandan/CMSSW_7_4_7/src/HiggsAnalysis/CombinedLimit/src/RooParametricShapeBinPdf.cc:236:1: internal compiler error: in possible_polymorphic_call_targets, at ipa-devirt.c:1556
}
^
Please submit a full bug report,

I didn't modify anything inside. Could anyone please let me know what is wrong?

Thanks
Saswati

Shape template tools

Extend the CombineHarvester(or CH methods after the CH-combine merge functionality to include:

  • Up/Down variation smoothing with simple TH1 methods. Consider various methods, possibly with chi^2 test to make sure that the smoothed histogram is an accurate representation of initial histogram.

  • Template binning optimisation.

Integrate HybridNewGrid into the combineTool.py framework

Then do some extensive implementation/testing of the termination criteria for generating toys at a given point:

  • minimum number of toys
  • maximum number of toys
  • absolute accuracy on CLs
  • minimum significance from interesting CLs (usually 0.05) and for which contours (obs, exp, exp+1, exp-1 etc)
  • info on toy fit failure rate
  • a way to merge output files so we don't end up with directories containing 1000s of files. https://root.cern.ch/doc/master/classTZIPFile.html - might be one way. Obviously if we're going to start deleting the user's job output this has to work with the different submission systems.
  • As well as CLs grid output should also save information on the number of toys that have been thrown at each point, and which (if any) of the criteria have been met. From this it would be good to have heat-map plots of the number of toys and the different observed and expected contours overlaid.

ParseDatacard with RooDataHist inputs

I’m trying to (gradually) switch to combineHarvester for the hh->bbtautau analyses, but it seems I’m not able to use it to parse a datacard/workspace with RooDataHist objects, do you know if this is a known feature, or if there’s something special to do to load RooDataHists in combineHarvester?

If you please, let me describe a bit in detail what is going on.
You can find here:
/afs/cern.ch/user/g/gortona/public/4Andrew/cards_MuTauprova/HHSM2b0jMcutBDTMT2/hh_2_C2_LHHSM_13TeV.txt
a simplified card with just few backgrounds and no shape systematics. It can be linked to either a file with one TH1F per process or to a file with a workspace where shapes are stored as RooDataHists (both files are in the same folder).
text2workspace works properly, so the card should be fine, and indeed when I parse the card linking it to the TH1Fs all goes through properly. The script I am using is “harvesta.py” in the same folder.
But when I link the card to the workspace, I hit "Problem: RooAbsData not found in workspace”.
I tried to dig a bit, and it seems that in loading the workspace into combineHarvester the datasets names are lost. Adding mapping.ws->Print() at line 319 of combineHarvester.cc prints out the message below, but unfortunately I couldn’t track the origin of this behaviour more than this.

Did this already happen to you? Do you know if I’m doing anything wrong or if is there any way to fix this behaviour?

Thank you a lot,
Giacomo

mapping.ws->Print()


RooWorkspace(w) w contents

variables
---------
(CMS_HHbbtt_scale_tau,MT2)

p.d.f.s
-------
RooHistPdf::ggHH_hbbhtt[ pdfObs=(MT2) ] = 0

datasets
--------
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)
RooDataHist::(MT2)

PS I’m using combineHarvester with combine v6.3.0 under CMSSW_7_4_7

Unexpected behavior generating toys from model with 2 POI

Reporting an issue seen with CMSSW_10_2_13, combine v8.2.0

I am running an analysis with 2 POI (rggF and rVBF). I submitted 100 parallel jobs of 10 toys each, expecting 1000 toys as a result. Instead, I see exactly 4x the expected number of toys in the output tree. The output tree contains sets of 4 toys with identical POI fitted values and errors, but with different values in the "limit" branch. In one of the four toys in the set, the limit branch matches the fitted value of the first listed POI. I attach a few plots here for demonstration, and in case it is useful I have uploaded an example output file on lxplus here:
/afs/cern.ch/user/j/jdickins/public/combine-issue/bias20VBF.root

limit_rggF.pdf
limit_rVBF.pdf
limitErr_rggFErr.pdf
limitErr_rVBFErr.pdf

This is the command I used for toy submission:
combineTool.py -M FitDiagnostics --setParameters rVBF=$bias,rggF=1 --trackParameters rggF,rVBF --trackErrors rggF,rVBF -n bias${bias}VBF -d ${modelfile} --cminDefaultMinimizerStrategy 0 --robustFit=1 -t 10 -s 1:100:1 --job-mode condor --task-name VBF$bias

The issue is two-fold:

  • The output should not contain 4x the number of toys requested
  • The limit branch should contain some sensible value for models where 2 POI are fitted simultaneously. It appears somewhat random at the moment.

Thanks, and let me know if any additional information from me can be useful.

Citing CombineHarvester

How should one cite the use of the CombineHarvester tool? Are there any publications about it specifically or is it enough to just use the citations listed on combine's FAQ?

Specific combine tasks and plotting examples

Should provide example scripts and documentation for the following:

  • Standard brazilian band limit plot with observed and expected
    • should explain how to produce pre-fit limits, blinding
    • also how to use the combineTool.py option with POI ranges as a function of MH
  • Signal injected limit plot using combine toy-generating machinery (need to explain snapshots and pre-fit/post-fit)
  • P-value/significance vs MH, with expected and optionally expected band
  • Nuisance parameter impacts and pulls plotting / comparison
  • 1D likelihood scans for POIs, producing mu-value breakdown summary plots
  • Generic 2D likelihood scans, extracting and plotting contours. ggH vs bbH is a good example
  • Channel compatibility check & toys
  • Goodness of fit & toys - check combineTool.py implementation for toys
  • Replacing observed data with asimov in the workspace for blinding
  • BSM physics models, but don't expect a lot of customers for this one yet

Later:

  • Physics models: kV,kF, mu_V, mu_F
  • Feldman-cousins intervals
  • New goodness of fit tests? Anderson-Darling could be better at identifying systematic biases in the tails in progress

Nicer default impact plot

Hi,

In many CMS papers, including the combine paper, the shown impact plots are nicer to look at. The colors are sharper. This style is not the one we get by running the default plotImpacts.py (as of v2.0.0).

It would be awesome if one of these nicer styles is included in the default script, lowering the barrier to publication quality impact plots.

Cheers,
Afiq

combineTool.py -M Impacts cannot account for toy input

When using -t 1 --toysFile <filename>.root to load in a toy dataset, the Impacts algorithm will not account for the change in output file names that includes the seed.

For example, after --doInitialFit, the output will be a file that looks like higgsCombine_initialFit_Test.MultiDimFit.mH{m}.123456.root but the --doFits step will look for higgsCombine_initialFit_Test.MultiDimFit.mH{m}.root. I'm working around this by renaming the files between steps but I imagine this should be changed in lines like these [1].

[1] -

'higgsCombine_initialFit_%(name)s_POI_%(poi)s.MultiDimFit.mH%(mh)s.root' % vars(), [poi], poiList))
else:
initialRes = utils.get_singles_results(
'higgsCombine_initialFit_%(name)s.MultiDimFit.mH%(mh)s.root' % vars(), poiList, poiList)

Double no longer an attribute of ROOT for root version 6.22

When trying to make "brazilian" limit plots with CombineTools,
python /my/eos/space/CMSSW_11_3_4/src/CombineHarvester/CombineTools/scripts/plotLimits.py limits_RSGluon.json --show exp --logy
I get an error where ROOT.Double is not an attribute that can be used. The traceback is included below for details:
Traceback (most recent call last): File "/my/eos/space/CMSSW_11_3_4/src/CombineHarvester/CombineTools/scripts/plotLimits.py", line 112, in <module> axis = plot.CreateAxisHists(len(pads), list(graph_sets[-1].values())[0], True) File "/my/eos/space/CMSSW_11_3_4/python/CombineHarvester/CombineTools/plotting.py", line 487, in CreateAxisHists h = CreateAxisHist(src, at_limits) File "/my/eos/space/CMSSW_11_3_4/python/CombineHarvester/CombineTools/plotting.py", line 469, in CreateAxisHist x = R.Double(0.) File "/cvmfs/cms.cern.ch/slc7_amd64_gcc900/lcg/root/6.22.08-ljfedo/lib/ROOT/_facade.py", line 163, in _fallback_getattr raise AttributeError("Failed to get attribute {} from ROOT".format(name)) AttributeError: Failed to get attribute Double from ROOT

For more details, I have installed the recommended version of Combine (v9), followed by version 2.0.0 of CombineHarvester, and a clone of auxiliaries directory as recommended here

Following a forum discussion, it seems that this is because of the root version 6.22 (that I think comes with CMSSW 11-3-4).

General plotting updates

General tasks that should probably be done before adding a bunch of new stuff:

  • Clean up plotting.py: group functions by common functionality, and roughly from low-level to high-level. Add a description for each function - can we get it into doxygen somehow?
  • Finish importing functions from Plotting.h. Do we want to try and maintain both interfaces or just stick with the pyroot? Probably easier to maintain just one implementation.
  • Would be good to have simple functions for building common plot objects from the combine limit tree. These would then be the ingredients we use to give full examples, but could easily be reused to make a different plot.
    • limits (vs MH or some other continuous/discrete variable)
    • p-values/significances
    • NLL scans (1 or 2D), reporting of intervals for the former, extracting contours for the latter

Job splitting for CRAB

Hi,

I'm trying to figure out how to run GoodnessOfFit toys over CRAB. I understand that I can run e.g. 25 toys in a single crab job like this:

combineTool.py \
    --job-mode crab3 \
    -M GoodnessOfFit \
    -d /path/to/card.root \
    -t 25 \
    # (other args)

This works fine. I would like to extend this though to be able to run many toys split over a number of jobs. I have tried two approaches to accomplish this:

  1. using the --merge argument. This does not seem to have any effect. I think this traces down to the fact that the combine tool thinks of my GoF command as "one entry in the job queue", rather than "25 independent entries".

  2. using the --custom-crab argument, and specifying config.Data.totalUnits = 50 for submission of e.g. 50 jobs. The submission works in this case, but the jobs fail because the script executed on the worker node tries to match the job ID to the job queue entries [1]. In the same vein as above, the jbo queue only has one entry here, so the script simply fails for all job IDs > 1.

Is there an existing well-defined way of doing this? If not, I can hotfix [1] for myself, but I'm not sure how to implement this in a sustainable way without creating spaghetti.

Any hints would be appreciated!

[1] https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/python/combine/CombineToolBase.py#L308

Sorting of impacts

This line
https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/python/combine/Impacts.py#L97

does not work when a nuisance has a same-sign impact for opposite-sign variations. This then ripples through to plotting oddities as it is used in
https://github.com/cms-analysis/CombineHarvester/blob/master/CombineTools/scripts/plotImpacts.py#L43

Perhaps instead of (paramScanRes[param][p][2] - paramScanRes[param][p][0])/2. it is more sensible to use max(map(abs,(paramScanRes[param][p][2],paramScanRes[param][p][0]))).

`RestoreBinning` does not properly work for some binnings, e.g. DNN scores?

Dear CombineHarvester experts,

we noticed an unexpected behaviour of the PostFitShapesFromWorkspace tool, when adding the datacard of a DNN score (0..1) for rebinning purposes:

The rebinned histogram had only as bin content the content of the first bin of the e.g. prefit_shapes.

Is this behaviour known?

(omitting the rebinning option via -d works fine and yields the expected output)

Best, Peter

Combine needs a mass value for the signal process

For some limit calculation methods, combine needs the signal process to have a mass value specified in the shape file. CH currently lets you make a datacard + shape file without mass value, if this is done it would be nice to assign the signal process a dummy mass value so that the datacard can be used with all combine methods.

ValidateDatacards.py cannot read Datacards which have one root file for each systematic

Hello,

As the title says: I have a datacard with one root file per systematic (70 systematics total). This would take hours to hadd into one file and I would not like to do that.
text2workspace can read such datacards without any problem and create a workspace, but ValidateDatacards.py seems not be able to do that and prints the error below.

Is it possible to add this functionality?

Kind regards,
Andrej

[SetFlag] Changing value of flag "check-negative-bins-on-import" from 1 to 0
[SetFlag] Changing value of flag "workspaces-use-clone" from 0 to 1
Error in TSystem::ExpandFileName: input: ../../../miniTreeHistograms/MergedTrees/miniTree$SYSTEMATIC.root, output: ../../../miniTreeHistograms/MergedTrees/miniTree$SYSTEMATIC.root
Error in TFile::TFile: error expanding path ../../../miniTreeHistograms/MergedTrees/miniTree$SYSTEMATIC.root
Traceback (most recent call last):
File "/nfs/dust/cms/user/asaibel/PhDWorkFolder/HiggsCombineTool/SL7/CMSSW_10_2_13/bin/slc7_amd64_gcc700/ValidateDatacards.py", line 110, in
cb.ParseDatacard(args.cards,"","",mass=args.mass)
File "/nfs/dust/cms/user/asaibel/PhDWorkFolder/HiggsCombineTool/SL7/CMSSW_10_2_13/python/CombineHarvester/CombineTools/ch.py", line 21, in ParseDatacard
return self.ParseDatacard(card, analysis, era, channel, bin_id, mass)
RuntimeError:


Context: Function ch::GetClonedTH1 at
/nfs/dust/cms/user/asaibel/PhDWorkFolder/HiggsCombineTool/SL7/CMSSW_10_2_13/src/CombineHarvester/CombineTools/src/TFileIO.cc:21
Problem: TH1 Nominal/combined/GreaterThreeBTag/DeltaRbin1/GreaterThreeJet/data_obs/jet4_btag not found in ../../../miniTreeHistograms/MergedTrees/miniTree$SYSTEMATIC.root

CreateAxisHist function

The lines 473, 474, 478 and 480
min = float(x)
max = float(y)
Should be changed to:
min = float(x.value)
max = float(x.value)

The way it's now it's not working, since x is not a number or a string, it is a ctypes.c_double object.

PostFitShapesFromWorkspace development

For complicated models with many categories the post-fit plotting can be quite time consuming. The easiest way to speed up the process is to introduce parallelisation.

  • Functionality to make post-fit plots only for a sub-set of categories in the datacard
  • Parallelise sampling (per-category) with batch submission

It would also be nice to have the following features:

  • Combine categories
  • Simplify sampling for other observables (not fit templates)

Validation tools

Need a class with a configurable set of checks to perform on a CH instance. These checks would include:

  • Checking bin and process names are sufficiently unique. I.e. one observation per bin name, >= 1 uniquely-named processes per bin name
  • Ensuring shapes are present for all Observations, Processes and shape Systematics. Verify that the binning is identical in all bins
  • Negative histogram bins. Provide options for correcting these - set bins to zero, set to a small positive value, merge with adjacent bins. (though automated bin merging is a big task in its own right)
  • Processes with no systematic uncertainties
  • Systematic uncertainties not connected to any process
  • Bins with no background expectation (and in particular no background but finite signal)
  • Maybe give information about pdf parameters that will or won't be getting a constraint
  • Could also tie-in the nuisance summary code here, though it's not strictly validation

Will add more things as they come up.

segmentation violation Error using partialCorrelationEdit.py script

Hello,
I have installed the latest version of combine tool and CombineHarvester and I tried the "partialCorrelationEdit.py" script using one of the simple shape Datacard examples in the
CMSSW_10_2_13/src/HiggsAnalysis/CombinedLimit/data/benchmarks/shapes/simple-shapes-TH1.txt

with the following command
python ../../../../../CombineHarvester/CombineTools/scripts/partialCorrelationEdit.py simple-shapes-TH1.txt -m 125 --process lumi,0.6
but I always get the segmentation violation Error at the last stage of the script[1]. I also tried with different datacards but always get the same problem. could you please let me know if I am missing something or there is an issue with the script?
Thanks
Mohsen

[1][SetFlag] Changing value of flag "workspace-uuid-recycle" from 1 to 1
[SetFlag] Changing value of flag "workspaces-use-clone" from 0 to 1
[SetFlag] Changing value of flag "import-parameter-err" from 1 to 0
[SetFlag] Changing value of flag "zero-negative-bins-on-import" from 0 to 0
[SetFlag] Changing value of flag "check-negative-bins-on-import" from 1 to 0

Setting correlation coefficient of lumi to 0.600000
The following systematics will be cloned and adjusted:


mass analysis era channel bin id process sig nuisance type value sh_d sh_u

125 1 0 signal 1 lumi lnN 1.1 0 0
125 1 0 background 0 lumi lnN 1 0 0

Writing new card and ROOT file: ('decorrelated_card.txt', 'decorrelated_card.shapes.root')

*** Break *** segmentation violation

Numpy issue when running on lxplus

Hi there,

I have set up the CombineTool code on lxplus according to the instructions in the README for this repo (CMSSW_10_2_13 and Combine tag v8.2.0), but I see an error related to the numpy installation. Is this a known problem? And is there a workaround?

Thanks,
Jennet

Here is an example command and the accompanying error:

[jdickins@lxplus795 2016-prefit]$ combineTool.py -M Impacts -d $modelfile -m 125 --robustFit 1 --doInitialFit -t -1 --setParameters rggF=1,rVBF=1,rZbb=1
Traceback (most recent call last):
File "/afs/cern.ch/work/j/jdickins/hbb-prod-modes/test/CMSSW_10_2_13/bin/slc7_amd64_gcc700/combineTool.py", line 8, in
from CombineHarvester.CombineTools.combine.ImpactsFromScans import ImpactsFromScans
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/afs/cern.ch/work/j/jdickins/hbb-prod-modes/test/CMSSW_10_2_13/python/CombineHarvester/CombineTools/combine/ImpactsFromScans.py", line 15, in
from numpy import matrix
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/lib/python2.7/site-packages/numpy-1.13.3-py2.7-linux-x86_64.egg/numpy/init.py", line 142, in
from . import add_newdocs
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/lib/python2.7/site-packages/numpy-1.13.3-py2.7-linux-x86_64.egg/numpy/add_newdocs.py", line 13, in
from numpy.lib import add_newdoc
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/lib/python2.7/site-packages/numpy-1.13.3-py2.7-linux-x86_64.egg/numpy/lib/init.py", line 8, in
from .type_check import *
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/lib/python2.7/site-packages/numpy-1.13.3-py2.7-linux-x86_64.egg/numpy/lib/type_check.py", line 11, in
import numpy.core.numeric as _nx
File "/cvmfs/cms.cern.ch/slc7_amd64_gcc700/lcg/root/6.12.07-gnimlf5/lib/ROOT.py", line 318, in _importhook
return _orig_ihook( name, *args, **kwds )
File "/cvmfs/atlas.cern.ch/repo/sw/software/21.2/AnalysisBaseExternals/21.2.25/InstallArea/x86_64-slc6-gcc62-opt/lib/python2.7/site-packages/numpy-1.13.3-py2.7-linux-x86_64.egg/numpy/core/init.py", line 26, in
raise ImportError(msg)
ImportError:
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try git clean -xdf (removes all
files not under version control). Otherwise reinstall numpy.

Original error was: libptf77blas.so.3: cannot open shared object file: No such file or directory

ROOT 6.22 compatibility (CMSSW 11_2)

After seeing that this PR in Combine was merged in this branch, I started doing the same kind of work for CombineHarvester (here).
Adapting to ROOT 6.22 seems to be limited to just a couple of TPython methods that have been moved in CPyCppyy and changed names (TPython::ObjectProxy_FromVoidPtr became CPyCppyy::Instance_FromVoidPtr and TPython::ObjectProxy_AsVoidPtr became CPyCppyy::Instance_AsVoidPtr) in the transition from "old" PyROOT to new PyROOT.

However, probably due to my lack of experience with scram, I haven't been able to succeed in correctly linking against libcppyy2_7.so in CMSSW_11_2_0_pre10.

To reproduce:

export SCRAM_ARCH=slc7_amd64_gcc900
cmsrel CMSSW_11_2_0_pre10
cd CMSSW_11_2_0_pre10/src
cmsenv
git clone -b 112x https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit.git HiggsAnalysis/CombinedLimit
git clone -b root622_comp https://github.com/maxgalli/CombineHarvester.git
scramv1 b

Running this returns a bunch of undefined references to the changed methods, like e.g.:

/cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld: tmp/slc7_amd64_gcc900/src/CombineHarvester/CombineTools/src/Combin
eHarvesterCombineTools/CombineHarvester_Python.cc.o: in function `convert_py_root_to_cpp_root<TFile>::construct(_object*, boost::python::converter::rvalue_from_python_stage1_data*)':
CombineHarvester_Python.cc:(.text._ZN27convert_py_root_to_cpp_rootI5TFileE9constructEP7_objectPN5boost6python9converter30rvalue_from_python_stage1_dataE[_ZN27convert_py_root_to_cpp_rootI5TFileE9constructEP7_object
PN5boost6python9converter30rvalue_from_python_stage1_dataE]+0x5): undefined reference to `CPyCppyy::Instance_AsVoidPtr(_object*)'

What I tried:
root-config --libs returns

-L/cvmfs/cms.cern.ch/slc7_amd64_gcc900/cms/cmssw/CMSSW_11_2_0_pre10/external/slc7_amd64_gcc900/bin/../../../../../../../slc7_amd64_gcc900/lcg/root/6.22.03-ghbfee2/lib -lCore -lImt -lRIO -lNet -lHist -lGraf -lGraf3d -lGpad -lROOTVecOps -lTree -lTreePlayer -lRint -lPostscript -lMatrix -lPhysics -lMathCore -lThread -lMultiProc -lROOTDataFrame -pthread -lm -ldl -rdynamic

The lib directory printed, correctly contains libcppyy2_7.so, but ld does not seem to look there:

$ /cvmfs/cms.cern.ch/slc7_amd64_gcc900/external/gcc/9.3.0/bin/../lib/gcc/x86_64-unknown-linux-gnu/9.3.0/../../../../x86_64-unknown-linux-gnu/bin/ld --verbose | grep SEARCH_DIR | tr -s ' ;' \\012
SEARCH_DIR("/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/tmp/BUILDROOT/610f04827ed0f827de9b8700ad3d9670/opt/cmssw/slc7_amd64_gcc900/external/gcc/9.3.0/x86_64-unknown-linux-gnu/lib64")
SEARCH_DIR("/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/tmp/BUILDROOT/610f04827ed0f827de9b8700ad3d9670/opt/cmssw/slc7_amd64_gcc900/external/gcc/9.3.0/lib64")
SEARCH_DIR("/usr/local/lib64")
SEARCH_DIR("/lib64")
SEARCH_DIR("/usr/lib64")
SEARCH_DIR("/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/tmp/BUILDROOT/610f04827ed0f827de9b8700ad3d9670/opt/cmssw/slc7_amd64_gcc900/external/gcc/9.3.0/x86_64-unknown-linux-gnu/lib")
SEARCH_DIR("/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_11_1_0_pre6-slc7_amd64_gcc900/build/CMSSW_11_1_0_pre6-build/tmp/BUILDROOT/610f04827ed0f827de9b8700ad3d9670/opt/cmssw/slc7_amd64_gcc900/external/gcc/9.3.0/lib")
SEARCH_DIR("/usr/local/lib")
SEARCH_DIR("/lib")
SEARCH_DIR("/usr/lib")

Also appending the path to LD_LIBRARY_PATH does not seem to work.

Do anyone have any suggestion concerning what to try and how to correctly implement it in a build process?

(ping also for @nsmith- and @andrzejnovak who worked on adapting Combine to ROOT 6.22).

Check for parameters which are not "attached"

One rather easy bug to encounter is mis-naming of parameters in the datacard for param style nuisances. It would be good if the validation could spot this happening and warn the user.
The simplest way would be to scan over parameters (maybe just nuisance parameters) and check if they have more than just the Gaussian constraint in their Client list.

Note that its not enough to check if there's a param which isn't encountered in the incoming workspaces since its feasible someone would create a param which say gets fed into a rateParam afterwards.

ValidateDatacards.py Problem: Systematic type autoMCStats not supported

Dear experts,

could you help with the ValidateDatacards.py problem, please?

Simple datacard is used (shown under [*]) and all other steps in the card validation and pull distributions were produced with combine tools without issues using this card. But ValidateDatacards.py datacard.txt fails with full error message printed below:

[SetFlag] Changing value of flag "check-negative-bins-on-import" from 1 to 0
[SetFlag] Changing value of flag "workspaces-use-clone" from 0 to 1
Traceback (most recent call last):
File "/cms/multilepton-3/olena/CMSSW_10_2_13/bin/slc7_amd64_gcc700/ValidateDatacards.py", line 110, in
cb.ParseDatacard(args.cards,"","",mass=args.mass)
File "/cms/multilepton-3/olena/CMSSW_10_2_13/python/CombineHarvester/CombineTools/ch.py", line 21, in ParseDatacard
return self.ParseDatacard(card, analysis, era, channel, bin_id, mass)
RuntimeError:


Context: Function ch::CombineHarvester::ParseDatacard at
/cms/multilepton-3/olena/CMSSW_10_2_13/src/CombineHarvester/CombineTools/src/CombineHarvester_Datacards.cc:454
Problem: Systematic type autoMCStats not supported


Please report issues at
https://github.com/cms-analysis/CombineHarvester/issues


[*] Datacard (simplified) :

imax 1 number of bins
jmax 2 number of processes minus 1
kmax 41 number of nuisance parameters
----------------------------------------------------------------------------------------------------------------------------------
shapes *     ch1  shapes_WPhiPS50mu_3L_LS0_Run2.root WPhimmMinOSSF3Lbin5B0_$PROCESS WPhimmMinOSSF3Lbin5B0_$PROCESS_$SYSTEMATIC
----------------------------------------------------------------------------------------------------------------------------------
bin          ch1
observation  1450
----------------------------------------------------------------------------------------------------------------------------------
bin          ch1   ch1   ch1
process      WPhiPS50mu   Prompt   MisID
process      0   1   2
rate         291.7487655130141   903.5581679344177   576.2285528182983
----------------------------------------------------------------------------------------------------------------------------------
BTagY16    shapeN2      -       1.0         -
BTagY17    shapeN2      -       1.0         -
BTagY18    shapeN2      -       1.0         -
LimiY16    shapeN2    1.0       1.0         -
LimiY16Y17Y18    shapeN2    1.0       1.0         -
LimiY17    shapeN2    1.0       1.0         -
LimiY17Y18    shapeN2    1.0       1.0         -
LimiY18    shapeN2    1.0       1.0         -
POGIDEleY16Y17Y18    shapeN2    1.0       1.0         -
POGIDMuY16Y17Y18    shapeN2    1.0       1.0         -
POGRecoEleY16Y17Y18    shapeN2    1.0       1.0         -
PU    shapeN2    1.0       1.0         -
ch1 autoMCStats 0 0 1

If I change the last line to ch1 autoMCStats 001 (without spaces), there is this error message is does not appear. The output is:

[SetFlag] Changing value of flag "check-negative-bins-on-import" from 1 to 0
[SetFlag] Changing value of flag "workspaces-use-clone" from 0 to 1
================================
=======Validation results=======
================================
>>>There were no warnings

Could you advise if I'm using the wrong format for autoMCStats or what is the reason for ValidateDatacards.py not recognizing ch1 autoMCStats 0 0 1 (other combine tools did not have errors regarding this line), but recognizing ch1 autoMCStats 001.

Thank you very much in advance,
Olena

Accidental push to cms-analysis CH instead of to the forked version

Dear all,

I've accidentally pushed some fork specific changes for CH to the official CH repo without making a proper pull request:
https://github.com/cms-analysis/CombineHarvester/compare/041188035d29b08580f5e11f94d79b529f7c640c..master

However, as you can see from the comparison, these are only some cosmetic changes to the (plotting), and extensions of the CMSHistFuncFactory, which are needed for MSSM application.

If you are OK with these changes, we could leave them as they are, overwise a git revert would be needed.

I apologize for this inconvenience,

Cheers,

Artur

ValidateDatacards.py looks for TH1 not required by datacard.

When running ValidateDatacards.py datacard.txt, I get:

*******************************************************************************
Context: Function ch::GetClonedTH1 at 
  /afs/cern.ch/work/r/rymuelle/public/nanoAODzPrime/higgscombine/CMSSW_10_2_13/src/CombineHarvester/CombineTools/src/TFileIO.cc:24
Problem: TH1 SR1-sys_0_nominal-0 not found in 2016/2016_shapes_df_input.root
*******************************************************************************
Please report issues at
  https://github.com/cms-analysis/CombineHarvester/issues
*******************************************************************************

However, the datacard in question does not require "SR1-sys_0_nominal-0", and I am unsure what sort of pattern ValidateDatacard is looking for that would cause it to look for this TH1.

Datacard in question:

Combination of name0=2016/2016_SR1_BFFZprimeToMuMu_fit_M_125_dbs0p5.txt  name1=2016/2016_SR2_BFFZprimeToMuMu_fit_M_125_dbs0p5.txt
imax 2 number of bins
jmax 1 number of processes minus 1
kmax 13 number of nuisance parameters
----------------------------------------------------------------------------------------------------------------------------------
shapes *           name0       2016/2016_shapes_df_input.root SR1-sys_0_nominal-$PROCESS SR1-$SYSTEMATIC-$PROCESS
shapes background  name0       2016/2016_shapes_df_input.root SR1-sys_0_nominal-background
shapes *           name1       2016/2016_shapes_df_input.root SR2-sys_0_nominal-$PROCESS SR2-$SYSTEMATIC-$PROCESS
shapes background  name1       2016/2016_shapes_df_input.root SR2-sys_0_nominal-background
----------------------------------------------------------------------------------------------------------------------------------
bin          name0  name1
observation  -1     -1   
----------------------------------------------------------------------------------------------------------------------------------
bin                                          name0       name0       name1       name1     
process                                      125         background  125         background
process                                      0           1           0           1         
rate                                         -1          -1          -1          -1        
----------------------------------------------------------------------------------------------------------------------------------
lumi                    lnN                  1.025       -           1.025       -         
sys_0.5_ISRFSR_2016_    shapeN2              1.0         -           1.0         -         
sys_0.5_L1_2016_        shapeN2              1.0         -           1.0         -         
sys_0.5_Muon_2016_      shapeN2              1.0         -           1.0         -         
sys_0.5_btag_2016_      shapeN2              1.0         -           1.0         -         
sys_0.5_elSF_2016_      shapeN2              1.0         -           1.0         -         
sys_0.5_jer_2016_       shapeN2              1.0         -           1.0         -         
sys_0.5_jes_2016_       shapeN2              1.0         -           1.0         -         
sys_0.5_pdf_2016_       shapeN2              1.0         -           1.0         -         
sys_0.5_pu_             shapeN2              1.0         -           1.0         -         
sys_0.5_puid_2016_      shapeN2              1.0         -           1.0         -         
sys_0.5_roch_2016_      shapeN2              1.0         -           1.0         -         
sys_0.5_trigger_2016_   shapeN2              1.0         -           1.0         -   

ValidateCards.py development

In addition to the existing JSON formatted output it would be nice to introduce:

  • Visualisation: optional plots for problematic regions, nominal template and up(down) variations

Future developments:

  • Think about and improve/add features to define problematic regions and shape variations

Setting the wrong nominal mass point while performing a fit with morphing

Hi!

The nominal mass point gets an unexpected value when calling BuildRooMorphing with the file parameter, for example when creating the workspace and datacards including:

...
RooRealVar scannedParameter("scannedParameter", "scannedParameter", -4.0, 18.0);
RooWorkspace ws("htt", "htt");
ch::CombineHarvester cb;
...
string b = cb.cp().channel({channel}).bin_set().at(0);
string p = cb.cp().bin({b}).signals().process_set().at(0);
TFile debug('debug.root', "RECREATE");
ch::BuildRooMorphing(ws, cb, b, p, scannedParameter, "norm", true, true, false, debug);
...

and calling then FitDiagnostics and PostFitShapesFromWorkspace to have a look on the pre/postfits one gets as a prefit the plot that corresponds to scannedParameter=18 in this example, so to the last value set in RooRealVar. My colleague pointed me that the actual reason for this is the way the RooRealVar is handed along in the Morphing process (as a pointer) and the fact that its value gets modified when writing the debug file. One can come around it by not using the file parameter, not by using the --setParameters option. I believe it's not a wanted behaviour?

Thank you for your assistence!

Best regards, Olena

Adding rateParam creation

Makes sense to implement as a ch::Systematic with type rateParam, but a few details to be worked out, such as adapting AddSyst and WriteDatacards.

combineTool.py: input/output file handling

Would be useful to have an option that copies input files to $TMPDIR at the beginning of the job, write output locally (not in the job submission $PWD), and then copies it back to the submission dir (or some other dir) at the end.

Because in principle we know the name of the output file at submission time could also think about tracking completed/lost jobs.

Segfault in CombineHarvester::WriteDatacard(string, string) in Python

I encountered a bug in Python bindings. I'm trying to create a datacard for a shape-based analysis using the Python interface. With the current master (6af279a), running

harvester = ch.CombineHarvester()
# ...
harvester.WriteDatacard('datacard.txt', 'shapes.root')

triggers a segmentation violation. The code works fine if I create the ROOT file beforehand:

shapes_file = ROOT.TFile('shapes.root', 'recreate')
harvester.WriteDatacard('datacard.txt', shapes_file)

The backtrace to the main WriteDatacard method suggest that the problem is in the Python bindings:

#0  0x00007ffff675cb50 in ch::CombineHarvester::WriteDatacard(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, TFile&)@plt ()
   from /user/aapopov/CMSSW/CMSSW_10_2_13__combine/lib/slc7_amd64_gcc700/libCombineHarvesterCombineTools.so
#1  0x00007ffff696dc7a in Overload3_WriteDatacard (cb=..., name="datacard.txt", file=...)
    at /user/aapopov/CMSSW/CMSSW_10_2_13__combine/src/CombineHarvester/CombineTools/src/CombineHarvester_Python.cc:136
#2  0x00007ffff69cda79 in boost::python::detail::invoke<int, void (*)(ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&), boost::python::arg_from_python<ch::CombineHarvester&>, boost::python::arg_from_python<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>, boost::python::arg_from_python<boost::python::api::object&> > (
    f=@0x68de18: 0x7ffff696dc1c <Overload3_WriteDatacard(ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&)>, ac0=..., ac1=..., ac2=...) at /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/boost/1.63.0-gnimlf/include/boost/python/detail/invoke.hpp:81
#3  0x00007ffff69bf6a5 in boost::python::detail::caller_arity<3u>::impl<void (*)(ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&), boost::python::default_call_policies, boost::mpl::vector4<void, ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&> >::operator() (this=0x68de18, args_=0x7ffff6af2aa0)
    at /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/boost/1.63.0-gnimlf/include/boost/python/detail/caller.hpp:218
#4  0x00007ffff69b0343 in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<void (*)(ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&), boost::python::default_call_policies, boost::mpl::vector4<void, ch::CombineHarvester&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::python::api::object&> > >::operator() (this=0x68de10, args=0x7ffff6af2aa0, kw=0x0)
    at /cvmfs/cms.cern.ch/slc7_amd64_gcc700/external/boost/1.63.0-gnimlf/include/boost/python/object/py_function.hpp:38
...

Note how the call is passed through Overload3_WriteDatacard instead of Overload1_WriteDatacard. Apparently Overload3_WriteDatacard tries to convert a string into a TFile. I also see that file_ inside it is null. Unfortunately, I don't know enough about the bindings to propose a fix.

Request for new package

Hello,

I would like a package to be created for the ND-OSU Top EFT analysis. We would like to begin using CombineHarvester to create our datacards. The documentation instructed me to create an issue here. Let me know how I can help with this process.

Best,
Anthony Lefeld

ValidateDatacards.py fails in compiling RooFormulas

When datacards include rateParam lines featuring RooFormulaVars such as:

alphaMuon2FJ_2018_SRH  rateParam Muon2FJ_2018_SRH DYJ @0 alphaMuon2FJ_2018_2Lep

the ValidateDatacards.py tool seem not to able to compile the above expression and brakes:

Error in <RooFormula::Compile>:  Bad numerical expression : "@0"

It seems that replacing the formula with "1.0*@0" fixes the compilation.

Edit: It seems that in general, if the RooFormula is expressed as a string, this does not cause compilation issues. So even:

alphaMuon2FJ_2018_SRH  rateParam Muon2FJ_2018_SRH DYJ "@0" alphaMuon2FJ_2018_2Lep

would work.

An example of datacard to reproduce the issue is this HIG-22-005 card.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.