A python framework for analyzing NanoAODs. Easy to use and highly configurable.
The framework is designed to allow analysis of any type of event topology. For reference this repository comes with the necessary tools for a analysis.
It is recommended to use a proper environment with Pepper. An example environment setup for DESY NAF can be found here, which can be sourced after cloning the repository. Pepper can be installed as a python package as follows:
git clone <repository url> pepper
cd pepper
source example/environment.sh
python3 -m pip install --upgrade --upgrade-strategy eager --editable .
# Additionally only if on CentOS7 (e.g. DESY NAF in 2023):
python3 -m pip install "urllib3<2"
This will update all dependencies to the latest version. Now pepper
can be imported as any other python package from any location. Because of the --editable
option, if you edit files inside your cloned pepper directory, the changes will be in effect already the next time you import pepper
.
Note: If you are on CentOS7, please run python3 -m pip install "urllib3<2"
, as written above. CentOS7 is lacking a recent OpenSSL version, thus an older urllib3 version is required.
In Pepper an analysis is implemented as a Processor class. A short example of such a Processor with many explanatory comments can be found in here. This processor can be run by executing python3 -m pepper.runproc example_processor.py example_config.json
(when inside the example directory). Also running python -m pepper.runproc -h
will show the available command line options.
In order to run on HTCondor using pepper.runproc
, one only has to specify the --condor
parameter followed by the number of jobs desired. Events will be split across jobs as evenly as possible.
To control which environment is employed on the HTCondor node, the parameter --condorinit
can be used. --condorinit
should point to a Shell script that can be sourced setting up the environment. If --condorinit
is not present, Pepper will instead use the script that is pointed at by the local environment variable PEPPER_CONDOR_ENV
. If this is also not set, the jobs will be run in the default environment of your HTCondor system. For an example environment script setting up LCG on CentOS7 see here.
When running on HTCondor, the local process, that has started also the jobs, needs to be kept open until everything has finished. In order to not accidentally kill the process by an unstable connection or similar, it is recommended to run it inside a byobu
, tmux
or screen
session, or to prepend the command nohup
.
A directory with logs from the jobs will be present under pepper_logs
. Directories inside pepper_logs
are numbered, the highest number is the one of the latest run. The log level of the logs inside is controller via the --loglevel
option. Set it to debug
to get full logging.
Pepper is able to access data sets from remote servers using XRootD. You should specify "file_mode": "local+xrootd"
and "xrootddomain": "xrootd-cms.infn.it"
in your config to enable it. There are four requirements to get it working (also in conjunction with HTCondor):
- xrootd must be installed. You can check with
python3 -m pip show xrootd
- The CMS Grid environment needs to be sourced (also inside the HTCondor job). This is done by the example environment script.
- The environment variable
X509_USER_PROXY
needs to be set to a file path accessible by Condor (/tmp/ is not). As above this is also needed inside the Condor job and is cone by the script. - A VOMS proxy needs to be created at the path pointed to by
X509_USER_PROXY
. To do this please once run:voms-proxy-init --voms cms --out $X509_USER_PROXY
.
If you get the error 'sslv3 alert certificate expired', please run the voms-proxy-init command again.
The scripts directory of this repository contains several helper scripts to obtain inputs and plot outputs:
calulate_stitching_factors.py
: Calculate factors to stitch MC samples from an already produced histogram (currently only 1D histograms supported)compute_mc_lumifactors.py
: Compute, the factors needed to scale MC to data
compute_pileup_weights.py
: Compute scale factors for pileup reweightingdelete_duplicate_outputs.py
: Check for duplication in the per event data produced by select_events.py, and move or delete any duplicatesexport_hists_from_state.py
: Save all histograms contained in a Pepper processor state, even if processing of all data hasn't been finished yetgenerate_btag_efficiencies.py
: Generate a ROOT file containing efficiency histograms needed for b-tagging scale factorsgenerate_jet_puid_efficiencies
: Generate a ROOT file containing efficiency histograms needed for computing jet pile-up ID scale factorsget_bad_local_files.py
: Find NanoAOD files that exist in the store directory but are not accessible, possibly due to technical issueshdf5_to_ttree.py
: Merge and convert Pepper HDF5 files to Root files containing TTreesmerge_hists.py
: Caluclate weighted average of two SF histogramsplot_control.py
: Create control plots from Coffea histogramsplotter.py
: Create control plots from Coffea histogramsproduce_met_xy_nums.py
: Convert MET-xy correction numbers from the C++ headers provided centrally to json filesrucio_create_rules.py
: Creates Rucio rules for all data sets specified in a Pepper config. Once the rules are approved, the data sets will be transfered to the local site.ttbarll_dy_sf_calculate.py
: Calculate scale factors for DY reweighting from the output of ttbarll_dy_sf_produce.pyttbarll_dy_sf_produce.py
: Produce the numbers needed for DY SF calculationttbarll_kinreco_hists_produce.py
: Generate histograms needed for top-quark kinematic reconstructionttbarll_select_events.py
: Run the main ttbarll analysis procedure, outputting histograms and per event datattbarll_trigger_sf_calculate.py
: Calculate SFs for ttbar dileptonic triggers using the output of ttbarll_trigger_sf_produce.py by cross-trigger methodttbarll_trigger_sf_produce.py
: Produce the numbers needed for trigger SF calculation
Configuration is done via JSON files. Examples can be found in the example
directory. Additional data needed for configuration, for example scale factors and cross sections, can be found in a separate data repository here https://gitlab.cern.ch/pepper/data
After downloading it, make sure to set the configuration variable "datadir" to the path where the data repository was downloaded.
For a detailed explanation on the configuration variables, see config_documentation.md
.
Feel free to submit merge requests to have your code included in this repository! Your code must comply with pep8. You can check this by running the following inside the Pepper directory:
python3 -m pip install .[dev] --user
python3 -m flake8