lcorcodilos / timber Goto Github PK

C++ 41.63% Python 56.47% Shell 1.45% CSS 0.07% Makefile 0.38%

timber's Issues

Add CondorHelper as a subpackage

Update code that is here [1] and then link to that

[1] - https://github.com/lcorcodilos/CondorHelper

Automatically build lib-timber (Include paths in .cc files behave differently in CompileCpp and root .L)

When a .cc file is compiled via CompileCpp(), the relative import of #include "../include/file.h does not work. It can be changed to #include "TIMBER/Framework/include/file.h but then you can't open an interactive root session and do .L path/to/file.cc. Could be a conditional but this seems cumbersome.

Best option is to setup a file to build a full TIMBER library (via setup.sh), store it in TIMBER/bin, and include it with analyzer(). No more CompileCpp() needed for the included modules and no more headaches with the include paths.

Runs tree not saved when using "RECREATE" to snapshot

Logic will save the Runs tree and then immediately overwrite it.

Factor out clang parsing for generic use outside Correction class

Allow SetActiveNode to take in str with Node name

Add either a DeepAK8 SF module or pieces to existing modules

Information here https://twiki.cern.ch/twiki/bin/viewauth/CMS/DeepAK8Tagging2018WPsSFs

Scale factors here https://github.com/cms-jet/deepAK8ScaleFactors

Need a example for short run

I found a code of short run here. code
It use Range function to limit the range but it's used after you call EnableImplicitMT.
So maybe we need a example to tell user not to use EnableImplicitMT or move it down if Range is used.

Add tcsh support for setup.sh

Make .csh version that modifies activate.csh in virtual env

Document C++ modules

Update TIMBER/data/README.md

Has "JHUanalyzer" in it and no links to relevant C++ modules.

P_T reordering for JES/R and HEM modules

When applying any new JEC or variations on the jet momentum, the jets may no longer be ordered in pt. Any HEM issue module will have the same effect. It's currently the responsibility of the user to account for this (if at all).

Accounting for the re-ordering for the HEM issue is not so difficult because of the one-time nature of the module. However, doing the variations on the jet pt is frankly a big headache because each variation in the pt is a new re-ordering (meaning a new collection needs to be created for each variation).

Without TIMBER

There are a few ways around this that do no involve TIMBER automation that could have varying implications depending on the analysis.

Just ignore the pt being out of order. If the analysis doesn't make any decisions on pt ordering, there's no need to do anything. For this reason, re-ordering should always be optional and not forced (from the TIMBER development perspective).
Process the variations in pt one at a time with either (a) or (b) below. Will be less efficient but that may not matter if one is running on condor and the number of jobs is already low.
Process the variations in parallel, tracking each variation of ordering manually but building the dataframe actions concurrently. This is more computationally efficient but requires lots of tracking and is prone to error (unless there's a good TIMBER idea... see below).

For (2) and (3), there are two sub-options.

a. If the analysis is only using 1 or 2 jets, one can simply search for the highest pt values in the new pt vector, extract the indices as new column variables, and use these variables everywhere in place of 0 and 1.

b. Use ReorderCollection() [1] to re-order the entire collection and use the new collection for access to variables (with 0 or 1)

Option (b) is more computationally expensive and it doesn't do much to improve the user interface. Option (a) is a bit more error prone since you're dealing with indices and debugging and indexing issue can be difficult (or hard to identify in the first place).

With TIMBER

Indexing > new collections

Learning from the Without TIMBER section, the optimized option seems to be to develop a new set of indices stored in a separate branch and to direct the user to use these if they want the re-ordered collection. For example, FatJet_pt[0] would become CalibratedFatJet_pt[JES_index[0]] to get the new leading jet (where JEC_index is the pt ordered list of indices for the original collection.

This is lightweight enough that users could make the choice to not use these values if they don't care about pt ordering (and actually, then there would be no computational penalty since they column would never be used).

Simultaneous branch action solutions

As an example, we have something like this...

                  base
                   |
                   1
                   |
                   2
                /  |   \
               /   |    \
              /    |     \
pt          nom   up    down  
variation    |     |      |
             3     3      3
identical    |     |      |
actions      4     4      4
3 & 4

where 1, 2, 3, and 4 are some actions on the dataframe and nom, up, down are the three branches of the processing that change the pt.

Option 1

One solution is to have an AnalyzerGroup() class which parallelizes actions on separate branches of the processing tree. The methods of the AnalyzerGroup are the same as the analyzer but just loop over all analyzer objects in the group to perform the action.

Pros: A new class keeps the logic separable.
Cons: All methods would have to be hard coded or a new generic proxy method would need to be written (making subsequent actions look less like actions on a single analyzer object)

Option 2

Modify analyzer() to always track multiple dataframes (the base case being the one dataframe to start). Then analyzer.DataFrame would point to a list of RDataFrames (via Nodes), not a single RDataFrame and every method acting on a Node would actually loop over all Nodes being tracked. There are some potential complications

The Nodes need to be tracked via a dictionary with unique keys (maybe NodeGroup class?). In fact, you'd most likely need subkeys pointing to information about the branch. Something like

allCurrentNodes = {
    "key1": {
        "node": Node(...),
        "CalibratedFatJet_*[*]: "CalibratedFatJet_*[key1_idx[*]]", # pattern for index substitution for cool idea below
        ...
    }
}

Snapshots would have to be saved to separate TTrees carefully.
It would be reasonable to assume that subsequent splittings could happen and these will also need to be tracked. It's not clear if these should be nested but that would require nesting analyzer objects which would be a more complicated task.
One should be able to remove nodes from the active group being tracked

Pros: No duplicating of functions/methods and probably less code overall needing to be added/changed. Subdictionaries would be powerful for string substitution. Everything shows up nicely in one PrintNodeTree()!
Cons: Lots of string parsing and substitution which is always error prone and can be hard to debug when the print out is lengthy.

Cool idea: Store the "list" of dataframes/Nodes as a dictionary/NodeGroup and use the subkeys to denote suffix of ordering indexes. Then do automatic find/replace on action strings with key and value pairs in the subdictionary so that one could do

a.Cut("...","CalibratedFatJet_pt[0] > 400")

and get back

CalibratedFatJet_pt[key1_idx[0]] > 400
...

Module for HEM issue

Initial implementation below. Would be nice to account for re-ordering in this code.

#include "TIMBER/TIMBER/Framework/include/Pythonic.h"
#include "iostream"

using namespace Pythonic;
float HEMdropOrWeight(std::string setname, int run, RVec<float> FatJet_eta, RVec<float> FatJet_phi, std::vector<int> idxToCheck = {0,1}) {
    bool isAffectedData = InString("data",setname) && (InString("C",setname) || InString("D",setname) || InString("B",setname));
    bool isDataB = InString("dataB",setname);
    float eventAffected;

    if (FatJet_eta.size() < 2) {std::cout << FatJet_eta.size() << std::endl;}

    for (size_t i = 0; i < idxToCheck.size(); i++) {
        int idx = idxToCheck[i];
        if ((FatJet_eta[idx] < -1.3) && (FatJet_eta[idx] > -2.5) && (FatJet_phi[idx] < -0.87) && (FatJet_phi[idx] > -1.57)){
            if (isAffectedData) {
                if (isDataB && run<319077) {
                    eventAffected = 1.0; // If not affected 2018B
                } else {
                    eventAffected = 0.0; // If affected 2018B, C, or D
                }
            } else if (InString("data",setname)){
                eventAffected = 1.0; // If any other data
            } else {
                eventAffected = 0.353; // If MC
            }
        } else {
            eventAffected = 1.0; // If not in region
        }
    }

    return eventAffected;
}

Then used like

a.Define('HEMscore','HEMdropOrWeight("%s",run,LeadJet_eta,LeadJet_phi)'%setname)
a.Cut('HEMcut','HEMscore>0') # HEM region dropped if needed
a.Define('HEM__vec','ROOT::VecOps::RVec<float> hemscore {HEMscore}; return hemscore;')

Add ability to save out RunChain

From Mattermost:

@mroguljic :
A part of my workflow is producing skims, where I load a list of rootfile with TIMBER, cut on some criteria, do a snapshot of the columns that I want to keep and store it as "Events" tree in the output file. https://github.com/mroguljic/X_YH_4b/blob/master/skim.py#L80

Is there a way, in TIMBER, to also save the combined runs tree in the output file?

@lcorcodilos
not at the moment but we could set something up
currently, it gets removed from memory here

TIMBER/TIMBER/Analyzer.py

Line 148 in adbffa0

del RunChain

If we just make RunChain to be self.RunChain (and don't del), it could be accessed for saving (though it wouldn't be a snapshot since that happens on the RDataFrame where you've done your slimming which is the Events tree)

a = analyzer('...')
a.RunChain.Merge('output.root')

The problem is that TChain.Merge() cannot update a rootfile - it only recreates. I tried another form which takes a TFile already made like

fout = ROOT.TFile.Open('test.root','UPDATE','',1)
fout.cd()
basketsize = 0
chain.Merge(fout,basketsize)
fout.Close()

I get a seg-fault though... :-(

So if one wants to store Events and Runs together in the same file, we need to play games a little. Either the Snapshot needs to happen after the Runs file is saved (since we can snapshot to a file that already exists) or one can simply hadd the two separate files together afterwards. Perhaps the ~~smartest~~ most general thing to do would be this:

def SaveRunChain(filename):
    if not os.exists(filename): # If it doesn't already exist, nothing special
        self.RunChain.Merge(filename)
    else:
        merge_filename = filename.replace('.root','_temp1.root')
        current_filename = filename.replace('.root','_temp2.root')
        ExecuteCmd('cp %s %s'%(filename, current_filename)) # copy existing file to <filename>_temp2.root
        self.RunChain.Merge(merge_filename) # create merged tree as <filename>_temp1.root
        ExecuteCmd('hadd -f %s %s'%(filename,merge_filename,current_filename)) # hadd them together into original filename (force overwrite not the greatest)
        ExecuteCmd('rm %s %s'%(merge_filename,current_filename)) # clean up

I should note that I haven't tested doing this yet though.

Speed up Prefire module with non-collection version

On the scale of things, this module can slow things considerably. It only needs the pt and eta of the AK4 jets, electrons, and photons and the jetIdx of the photons and electrons (8 total args) and they are always named the same so there's no extra burden on the user.

Create way for AddCorrection to load existing columns

Useful in the case where a snapshot has been done already with the calculation. The calculation should be recycled without the need for a Correction to be instantiated and evaluated again.

TopTag_SF: Secondary constructor which takes in tau32 cut value rather than WP code

Create full b*->tW all-hadronic example

Adjust clang reader warning about matching columns

This should not show

WARNING: Not able to find arg FatJet_tau2[1]/FatJet_tau1[1] written in /home/lucas/Projects/RDFanalyzer/TIMBER/TIMBER/Framework/src/Wtag_SF.cc in available columns. 
         If `FatJet_tau2[1]/FatJet_tau1[1]` is a value and not a column name, this warning can be ignored.

Consider an APV flag w/ provenance

2016 data and MC can be marked for different chunks.

For data, the "HIPM" indication is in the DAS path and is the first chunk of data from B to part of F (F is split into two sets - one with HIPM and one without).
For MC, the "preVFP" and "postVFP" strings are used to denote the simulations of the two different detector conditions in 2016. The "preVFP" aligns with "HIPM" and "postVFP" aligns with no "HIPM". The samples with the VFP also have "NanoAODv9APV" - APV a third acronym for the same effect.
The pileup is denoted with preVFP and postVFP plus a version with neither (unclear what this version is - an average?)
The JECs are denoted with APV.

Add tool to get lumi json for lumi calc

Port printLumiJson.py

Add automatic reporting to CL for selection on final RDataFrame node

General "look up" class for histogram that opens before looping accessed while looping

The idea is to make a constructor that opens up the histogram and then a member function of the class (eval) that takes as input a branch name to evaluate and find the corresponding bin and bin value in the already opened histogram

We can also start simple and then try adding extra features (like assume that the histogram will always be 1D and once that's working, we can try adding functionality to check for 2D or 3D histograms so those can be used automatically as well)

W tagging SF module

Could be simplified but tau21 module below

#include <vector>
#include <string>

class Wtag_SF {
    private:
        int year_;
        float HP;
        float HP_unc;
        float LP;
        float LP_unc;
    public:
        Wtag_SF(int year);
        ~Wtag_SF(){};
        std::vector<float> eval(float tau21);
};

Wtag_SF::Wtag_SF(int year){
    if (year > 2000) {
        year_ = year - 2000;
    } else {
        year_ = year;
    }

    if (year == 16) {
        HP = 1.00; HP_unc = 0.06;
        LP = 0.96; LP_unc = 0.11;
    } else if (year == 17) {
        HP = 0.97; HP_unc = 0.06;
        LP = 1.14; LP_unc = 0.29;
    } else if (year == 18) {
        HP = 0.98; HP_unc = 0.027;
        LP = 1.12; LP_unc = 0.275;
    } else {
        throw "Wtag_SF: Year not supported";
    }
}

std::vector<float> Wtag_SF::eval(float tau21) {
    std::vector<float> out;

    if (year_ == 16) {
        if (tau21 < 0.4) {
            out = {HP, HP+HP_unc, HP-HP_unc};
        } else {
            out = {LP, LP+LP_unc, LP-LP_unc};
        }

    } else if (year_ == 17) {
        if (tau21 < 0.45) {
            out = {HP, HP+HP_unc, HP-HP_unc};
        } else if (tau21 < 0.75) {
            out = {LP, LP+LP_unc, LP-LP_unc};
        } else {
            out = {1., 1., 1.};
        }

    } else if (year_ == 18) {
        if (tau21 < 0.4) {
            out = {HP, HP+HP_unc, HP-HP_unc};
        } else if (tau21 < 0.75) {
            out = {LP, LP+LP_unc, LP-LP_unc};
        } else {
            out = {1., 1., 1.};
        }
    } 

    return out;
}

Optimize SofA collections with DOD

Very nice blog posts on this here doing almost exactly what is needed - create object which is a proxy to column values and reads to the cache the actual used vector elements and does not waste cache space.

Swig (or other) GenMatching functionality for simple python experimenting

Save all filters in a chain to a LaTeX table

Algorithm:

Start at final node
Scale up it and track all filters in dictToLatex friendly OrderedDict
- Change dictToLatex format to be paper friendly (caption at top, no vertical lines, etc)
Reverse dict order (since they'll have been added end->start)

Bonus: Create a function to compare columns of a row. If they are the same, merge them into one cell.

Include generated event amount in CutflowTxt()

Need to document installation of clang/cindex for python

pip install clang

This may not work automatically for some systems though. One may get the following error:

clang.cindex.LibclangError: libclang.so: cannot open shared object file: No such file or directory. To provide a path to libclang use Config.set_library_path() or Config.set_library_file().

In this case, python cannot find the shared library, libclang.so. This is either because clang (which clang to check) is not installed or the shared library is not called libclang.so. The real location and/or name will depend on the system but for Ubuntu 18.04 for example, the needed library is /usr/lib/x86_64-linux-gnu/libclang-6.0.so.1.

The issue can be solved simply by creating a symbolic link

sudo ln -s libclang.so.1 libclang.so

Add pileup weight module

NanoAODtools version here

Add prefire correction module

Version in NanoAODtools here

Top tagging SF module

For tau32+sj btag below. Needs quite a bit of work but this complicated because of the matching requirement.

#include <string>
#include "TFile.h"
#include "TH1.h"
#include <Math/Vector4Dfwd.h>
#include <Math/GenVector/LorentzVector.h>
#include <Math/GenVector/PtEtaPhiM4D.h>

#include "TIMBER/TIMBER/Framework/include/GenMatching.h"
using LVector = ROOT::Math::PtEtaPhiMVector;

class TopTag_SF {
    private:
        std::string workpoint_name;
        int year_;
        TFile* file;
        TH1* mergedHist;
        TH1* semimergedHist;
        TH1* notmergedHist;
        TH1* mergedHist_up;
        TH1* semimergedHist_up;
        TH1* notmergedHist_up;
        TH1* mergedHist_down;
        TH1* semimergedHist_down;
        TH1* notmergedHist_down;
    public:
        TopTag_SF(int year, int workpoint, bool NoMassCut=false);
        ~TopTag_SF(){};
        int NMerged(LVector top_vect, vector<Particle*> Ws,
                vector<Particle*> quarks, GenParticleTree GPT);

        RVec<double> eval(LVector top_vect, int nGenPart,
                RVec<float> GenPart_pt, RVec<float> GenPart_eta,
                RVec<float> GenPart_phi, RVec<float> GenPart_mass,
                RVec<int> GenPart_pdgId, RVec<int> GenPart_status,
                RVec<int> GenPart_statusFlags, RVec<int> GenPart_genPartIdxMother);
};

TopTag_SF::TopTag_SF(int year, int workpoint, bool NoMassCut) {
    workpoint_name = "wp"+std::to_string(workpoint);
    if (year > 2000) {
        year_ = year - 2000;
    } else {
        year_ = year;
    }

    std::string filename = "SFs/20"+std::to_string(year_)+"TopTaggingScaleFactors";
    if (NoMassCut){filename += "_NoMassCut.root";}
    else {filename += ".root";}

    std::string mergedName, semimergedName, notmergedName,
                mergedName_up, semimergedName_up, notmergedName_up,
                mergedName_down, semimergedName_down, notmergedName_down;
    
    mergedName = "PUPPI_"+workpoint_name+"_btag/sf_mergedTop_nominal";
    semimergedName = "PUPPI_"+workpoint_name+"_btag/sf_semimergedTop_nominal";
    notmergedName = "PUPPI_"+workpoint_name+"_btag/sf_notmergedTop_nominal";

    mergedName_up = "PUPPI_"+workpoint_name+"_btag/sf_mergedTop_up";
    semimergedName_up = "PUPPI_"+workpoint_name+"_btag/sf_semimergedTop_up";
    notmergedName_up = "PUPPI_"+workpoint_name+"_btag/sf_notmergedTop_up";

    mergedName_down = "PUPPI_"+workpoint_name+"_btag/sf_mergedTop_down";
    semimergedName_down = "PUPPI_"+workpoint_name+"_btag/sf_semimergedTop_down";
    notmergedName_down = "PUPPI_"+workpoint_name+"_btag/sf_notmergedTop_down";

    file = TFile::Open(filename.c_str());
    mergedHist = (TH1*)file->Get(mergedName.c_str());    
    semimergedHist = (TH1*)file->Get(semimergedName.c_str());
    notmergedHist = (TH1*)file->Get(notmergedName.c_str());

    mergedHist_up = (TH1*)file->Get(mergedName_up.c_str());    
    semimergedHist_up = (TH1*)file->Get(semimergedName_up.c_str());
    notmergedHist_up = (TH1*)file->Get(notmergedName_up.c_str());

    mergedHist_down = (TH1*)file->Get(mergedName_down.c_str());    
    semimergedHist_down = (TH1*)file->Get(semimergedName_down.c_str());
    notmergedHist_down = (TH1*)file->Get(notmergedName_down.c_str());
}

RVec<double> TopTag_SF::eval(LVector top_vect, int nGenPart,
        RVec<float> GenPart_pt, RVec<float> GenPart_eta,
        RVec<float> GenPart_phi, RVec<float> GenPart_mass,
        RVec<int> GenPart_pdgId, RVec<int> GenPart_status,
        RVec<int> GenPart_statusFlags, RVec<int> GenPart_genPartIdxMother){
    
    GenParticleObjs GenParticles (GenPart_pt, GenPart_eta,
                                  GenPart_phi, GenPart_mass,
                                  GenPart_pdgId, GenPart_status,
                                  GenPart_statusFlags, GenPart_genPartIdxMother);

    GenParticleTree GPT;
    // prongs are final particles we'll check
    vector<Particle*> Ws, quarks; 

    for (size_t i = 0; i < nGenPart; i++) {
        GenParticles.SetIndex(i); // access ith particle
        Particle* this_particle = &GenParticles.particle;
        GPT.AddParticle(this_particle); // add particle to tree
        int this_pdgId = *(this_particle->pdgId);
        if (abs(this_pdgId) == 24) {
            Ws.push_back(this_particle);
        } else if (abs(this_pdgId) >= 1 && abs(this_pdgId) <= 5) {
            quarks.push_back(this_particle);
        }
    }
    int nMergedProngs = NMerged(top_vect, Ws, quarks, GPT);

    TH1* hnom; TH1* hup; TH1* hdown;
    if (nMergedProngs == 3){
        hnom = mergedHist;
        hup = mergedHist_up;
        hdown = mergedHist_down;
    } else if (nMergedProngs == 2) {
        hnom = semimergedHist;
        hup = semimergedHist_up;
        hdown = semimergedHist_down;
    } else if (nMergedProngs == 1) {
        hnom = notmergedHist;
        hup = notmergedHist_up;
        hdown = notmergedHist_down;
    } else {
        return {1, 1, 1};
    }
        
    int sfbin_nom, sfbin_up, sfbin_down;
    if (top_vect.Pt() > 5000){
        sfbin_nom = hnom->GetNbinsX();
        sfbin_up = hup->GetNbinsX();
        sfbin_down = hdown->GetNbinsX();
    } else {
        sfbin_nom = hnom->FindFixBin(top_vect.Pt());
        sfbin_up = hup->FindFixBin(top_vect.Pt());
        sfbin_down = hdown->FindFixBin(top_vect.Pt());
    }
    return {hnom->GetBinContent(sfbin_nom),
            hup->GetBinContent(sfbin_up),
            hdown->GetBinContent(sfbin_down)};

}

int TopTag_SF::NMerged(LVector top_vect, vector<Particle*> Ws, vector<Particle*> quarks, GenParticleTree GPT) {
    int nmerged = 0;
    // prongs are final particles we'll check
    vector<Particle*> prongs; 

    Particle *q, *bottom_parent;
    for (size_t iq = 0; iq < quarks.size(); iq++) {
        q = quarks[iq];
        if (abs(*(q->pdgId)) == 5) { // if bottom
            bottom_parent = GPT.GetParent(q);
            if (bottom_parent->flag != false) { // if has parent
                // if parent is a matched top
                if (abs(*(bottom_parent->pdgId)) == 6 && bottom_parent->DeltaR(top_vect) < 0.8) { 
                    prongs.push_back(q);
                }
            }
        }
    }

    Particle *W, *this_W, *wChild, *wParent;
    vector<Particle*> this_W_children;
    for (size_t iW = 0; iW < Ws.size(); iW++) {
        W = Ws[iW];
        wParent = GPT.GetParent(W);
        if (wParent->flag != false) {
            // Make sure parent is top that's in the jet
            if (abs(*(wParent->pdgId)) == 6 && wParent->DeltaR(top_vect) < 0.8) {
                this_W = W;
                this_W_children = GPT.GetChildren(this_W);
                // Make sure the child is not just another W
                if (this_W_children.size() == 1 && this_W_children[0]->pdgId == W->pdgId) {
                    this_W = this_W_children[0];
                    this_W_children = GPT.GetChildren(this_W);
                }
                // Add children as prongs
                for (size_t ichild = 0; ichild < this_W_children.size(); ichild++) {
                    wChild = this_W_children[ichild];
                    int child_pdgId = *(wChild->pdgId);
                    if (abs(child_pdgId) >= 1 && abs(child_pdgId) <= 5) {
                        prongs.push_back(wChild);
                    }
                } 
            }
        }
    }

    for (size_t iprong = 0; iprong < prongs.size(); iprong++) {
        if (prongs[iprong]->DeltaR(top_vect) < 0.8) {
            nmerged++;
        }
    }

    return nmerged;
}

Consolidate CutflowHist and CutflowTxt

Duplicate code currently.

New JME module constructors that don't use collection structs

Makes the modules portable outside of TIMBER (and probably faster)

Docstring for analyzer.Apply() incomplete

Fix conflict between snapshotting all columns and collection structs

Right now, if someone snapshots with columns == 'all' while also using the collection structs (via something like the AutoJME tool), an error will occur because the struct cannot be written to a TTree. The block below should be added before here [1] to check for the struct and skip adding it to the snapshot.

[1] -

TIMBER/TIMBER/Analyzer.py

Line 1601 in e77e15d

self.DataFrame.Snapshot(treename,outfilename,'',opts)

columns_to_save = []
for c in a.GetColumnNames():
    if c.split('_')[0] in a.GetCollectionNames():
        continue
    elif c.endswith('s') and c[:-1] in a.GetCollectionNames():
        continue
    else:
        columns_to_save.append(c)

for c in a.GetCollectionNames():
    columns_to_save.append(c+'_[^_].*')

Implement clang parsing for `corr` type Correction

Add method to get weight string for a correction...

...to plug into a Histo*D call.

Ex.

a.AddCorrection(corr1,...)
a.AddCorrection(corr2,...)
a.MakeWeightCols('base')

histo = a.DataFrame.Histo1D(...,a.GetWeightStr('base',corr1,'up'))
#... or ...
histo = a.DataFrame.Histo1D(...,a.GetWeightStr('base',corr1.name,'up'))

Does not need to be an analyzer method because the naming is predictable but if it is, we can check that it exists first and throw an error if it doesn't (to help avoid headaches from RDataFrame seg faults when mistakes are made). Also, an external function would not gain anything in terms of being accessible to those not using analyzer() because the weight columns will only be built if one is using analyzer().

CompareShapes() cannot take in empty bkg or signals arguements

Traceback (most recent call last):
  File "exercises/example.py", line 185, in <module>
    CompareShapes(plot_filename,options.year,varnames[varname],bkgs={},signals=signal_hists,colors=colors,names=names)
  File "build/bdist.linux-x86_64/egg/TIMBER/Tools/Plot.py", line 96, in CompareShapes
IndexError: list index out of range

Auto-convert .dot if graphviz-dev is installed

Add to PrintNodeTree() a test of whether dot has ability to render to PNG and do that if possible (instead of just rendering to dot).

Library compiling doesn't play nice with periods in TIMBERPATH

From @mroguljic

"TIMBERPATH at lxplus starts with /afs/cern.ch/user/m/mrogulji/
This line changes the path where it's trying to find the library to /afs/cern_ch/user/m/mrogulji/

TIMBER/TIMBER/Tools/Common.py

Line 123 in b943205

lib_path = blockcode.replace('.','_')+'.so'

I changed the line to lib_path = blockcode.replace('.cc','_cc.so') and then it seems to work"

This should obviously be fixed but we should also do a quick string parse to dynamically get the file extension in the case that .cc is not used.

JER ids
PUPPI SD mass correction and validate msoftdrop
Updated JMR and JMS values (only match the NanoAODtools right now)

AK4 missing pieces:

Have not attempted to implement the NanoAODtools version at all - we only have what the AK8 version can also accommodate (so AK4 value lookup works but manipulation afterwards is not done).
JER gen jet matching needs to be changed a bit
Nothing currently done with MET

Add benchmark tests

Plotting the Missing ET (or any event level variable).
Plotting the Jet pT (or any variable that is a per-event array).
Plotting the Jet pT for jets that have an jet pT > 20 GeV and abs(jet eta) < 1.0
Plotting the Missing ET for jets with at least 2 jets with Jet pT > 40 and abs(jet Eta) < 1.0
Plot the opposite-sign muon pair mass for all combinations of muons
Plot the Missing ET for events that have an opposite-sign muon pair mass in the range 60-120 GeV (double loop over single collection, math)
Plot the sum of the pT of jets with pT > 30 GeV that are not within 0.4 from any lepton with pt > 10 GeV (looping over two collections)
Plot the transverse mass of the Missing ET and a lepton for events with at least 3 leptons and two of the leptons make up a Z (same flavor, opp sign, closest to invar mass of 91.2 GeV), where the lepton being plotted is the third lepton.
Create a group of plots (jet pT, eta, phi, N_jets). Now make it for all events, for events with missing et > 20 GeV, and for events with missing et > 20 GeV and 2 jets with 40 GeV and abs(eta) < 1.0. Demonstrate making “groups” of plots, and a graphical cut flow

Possible bug in MakeSoverB?

The integral of the actual s/sqrt(b) looked a little strange during the last DAS. Need to investigate.