Code Monkey home page Code Monkey logo

timber's Introduction

TIMBER {#mainpage}

Full Documentation

TIMBER (Tree Interface for Making Binned Events with RDataFrame) is an easy-to-use and fast python analysis framework used to quickly process CMS data sets. Default arguments assume the use of the NanoAOD format but any ROOT TTree can be processed.

Quick install

Python 3 is recommended since 2.7 is now no longer supported. Remember to make sure your ROOT version has been built with python3 compatibility. Working in a virtual environment is also recommended. Below are the commands for using virtualenv but you're obviously free to use your favorite tool for the job (you can install virtualenv for Python 3 with pip3 install virtualenv).

virtualenv timber-env
source timber-env/bin/activate
git clone https://github.com/lcorcodilos/TIMBER.git
cd TIMBER
source setup.sh

The RDataFrame Backbone

TIMBER's speed comes from the use of ROOT's RDataFrame. RDataFrame offers "multi-threading and other low-level optimisations" which make analysis level processing faster and more efficient than traditional python for loops. However, RDataFrame derives its speed from its C++ back-end and while an RDataFrame object can be instantiated and manipulated in python, any actions on it are written in C++ (even if you're using python).

No more for loops

Using RDataFrame means a fundamental re-thinking of how we treat a block of data or simulation. Instead of looping over the events or entries of a TTree (or other data format), the TTree is converted into a table called the "data frame". A user then books a number of "lazy" actions on the data frame such as filtering out events or calculating new values. These actions aren't performed though until the data frame needs to be evaluated (ex. you ask to plot a histogram from it).

In this way, there are no more for loops and instead just actions on the data frame table that transform it into a final table of values that the analyzer cares about.

Anatomy of a data frame

Each row of the table is a separate event and each column is a different variable in the event (a branch in TTree terms). Columns can be single values or vectors (specifically ROOT::VecOps:RVec).

Since each row is an event, vectors are necessary for the case of multiple of the same physics object in an event - for example, multiple electrons.

NOTE NanoAOD orders these vectors in \f$p_T\f$ of the objects. So if you'd like the \f$\eta\f$ of the leading electron, it is stored as Electron_eta[0]

This can make accessing values tricky! For example, if there's one electron in an event and the analyzer asks for Electron_eta[1], the computer will return a seg fault. These are the types of problems that TIMBER attempts to solve (even if it's just by users sharing their experiences).

Happy Analyzers

TIMBER is meant to keep both the processing fast via RDataFrame and the analyzer fast via python scripting.

To maintain python's appeal in HEP as a quick scripting language, TIMBER handles interfacing with RDataFrame so the analyzer can focus on writing their analysis.

TIMBER automates opening one or many ROOT files, calculating the number of events generated (provided the ROOT files are NanoAOD simulation), loading in C++ scripts for use while looping over the data frame, and grouping actions for easy manipulation.

In addition, TIMBER treats each step in the RDataFrame processing as a "node" and keeps track of these nodes as a larger tree. Each action (or group of actions) performed on a node produces another node and nodes store information about their parents or children. This makes it possible to write tools like Nminus1() which takes as input a node and a group of cuts to apply and returns N new nodes, each with every cut but one applied.

Finally, the RDataFrame for each node is always kept easily accessible so that any of the native RDataFrame tools are at the user's fingertips.

Sharing is caring

TIMBER includes a repository of common algorithms used frequently in CMS which access scale factors, calculate pileup weights, and more. These are all written in C++ for use in Cut and Define arguments and are provided so that users have a common tool box to share. Additionally, the AnalysisModules folder welcomes additions of custom C++ modules on a per-analysis basis so that the code can be properly archived for future reference and for sharing with other analyzers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.