TIMBER {#mainpage}

Full Documentation

TIMBER (Tree Interface for Making Binned Events with RDataFrame) is an easy-to-use and fast python analysis framework used to quickly process CMS data sets. Default arguments assume the use of the NanoAOD format but any ROOT TTree can be processed.

Quick install

Python 3 is recommended since 2.7 is now no longer supported. Remember to make sure your ROOT version has been built with python3 compatibility. Working in a virtual environment is also recommended. Below are the commands for using virtualenv but you're obviously free to use your favorite tool for the job (you can install virtualenv for Python 3 with pip3 install virtualenv).

virtualenv timber-env
source timber-env/bin/activate
git clone https://github.com/lcorcodilos/TIMBER.git
cd TIMBER
source setup.sh

The RDataFrame Backbone

TIMBER's speed comes from the use of ROOT's RDataFrame. RDataFrame offers "multi-threading and other low-level optimisations" which make analysis level processing faster and more efficient than traditional python for loops. However, RDataFrame derives its speed from its C++ back-end and while an RDataFrame object can be instantiated and manipulated in python, any actions on it are written in C++ (even if you're using python).

No more `for` loops

Using RDataFrame means a fundamental re-thinking of how we treat a block of data or simulation. Instead of looping over the events or entries of a TTree (or other data format), the TTree is converted into a table called the "data frame". A user then books a number of "lazy" actions on the data frame such as filtering out events or calculating new values. These actions aren't performed though until the data frame needs to be evaluated (ex. you ask to plot a histogram from it).

In this way, there are no more for loops and instead just actions on the data frame table that transform it into a final table of values that the analyzer cares about.

Anatomy of a data frame

Each row of the table is a separate event and each column is a different variable in the event (a branch in TTree terms). Columns can be single values or vectors (specifically ROOT::VecOps:RVec).

Since each row is an event, vectors are necessary for the case of multiple of the same physics object in an event - for example, multiple electrons.

NOTE NanoAOD orders these vectors in \f$p_T\f$ of the objects. So if you'd like the \f$\eta\f$ of the leading electron, it is stored as Electron_eta[0]

This can make accessing values tricky! For example, if there's one electron in an event and the analyzer asks for Electron_eta[1], the computer will return a seg fault. These are the types of problems that TIMBER attempts to solve (even if it's just by users sharing their experiences).

Happy Analyzers

TIMBER is meant to keep both the processing fast via RDataFrame and the analyzer fast via python scripting.

To maintain python's appeal in HEP as a quick scripting language, TIMBER handles interfacing with RDataFrame so the analyzer can focus on writing their analysis.

TIMBER automates opening one or many ROOT files, calculating the number of events generated (provided the ROOT files are NanoAOD simulation), loading in C++ scripts for use while looping over the data frame, and grouping actions for easy manipulation.

In addition, TIMBER treats each step in the RDataFrame processing as a "node" and keeps track of these nodes as a larger tree. Each action (or group of actions) performed on a node produces another node and nodes store information about their parents or children. This makes it possible to write tools like Nminus1() which takes as input a node and a group of cuts to apply and returns N new nodes, each with every cut but one applied.

Finally, the RDataFrame for each node is always kept easily accessible so that any of the native RDataFrame tools are at the user's fingertips.

Sharing is caring

TIMBER includes a repository of common algorithms used frequently in CMS which access scale factors, calculate pileup weights, and more. These are all written in C++ for use in Cut and Define arguments and are provided so that users have a common tool box to share. Additionally, the AnalysisModules folder welcomes additions of custom C++ modules on a per-analysis basis so that the code can be properly archived for future reference and for sharing with other analyzers.

osherson / timber Goto Github PK

timber's Introduction

TIMBER {#mainpage}

Quick install

The RDataFrame Backbone

No more `for` loops

Anatomy of a data frame

Happy Analyzers

Sharing is caring

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

osherson / timber Goto Github PK

timber's Introduction

TIMBER {#mainpage}

Quick install

The RDataFrame Backbone

No more for loops

Anatomy of a data frame

Happy Analyzers

Sharing is caring

Recommend Projects

Recommend Topics

Recommend Org

No more `for` loops