Code Monkey home page Code Monkey logo

reco-ntuples's Introduction

reco-ntuples

Home of the Ntuplizer for the HGCAL reconstruction software studies

Ntuple content definitions can be found at Definitions.md.

This version is based on >= CMSSW_11_0_0_pre10. Please check if there are later CMSSW_11_X_Y versions available to profit from bugfixes.

cmsrel CMSSW_11_0_0_pre10
cd CMSSW_11_0_0_pre10/src
cmsenv
git clone [email protected]:CMS-HGCAL/reco-ntuples.git RecoNtuples
cd RecoNtuples
git checkout -b topic_${USER}
cd ../
scram b -j4

The input file needs to be step3 (i.e. RECO). Example configs are provided in HGCalAnalysis/test.

Mind that depending on your RECO input file, you need to set inputTag_HGCalMultiCluster in the config part for the EDAnalyzer of HGCalAnalysis (i.e. the ntupliser) to either hgcalMultiClusters (newer releases) or hgcalLayerClusters (older releases).

For older versions, please check the releases tab.

reco-ntuples's People

Contributors

adavidzh avatar ajgilbert avatar artlbv avatar beaudett avatar clelange avatar edjtscott avatar felicepantaleo avatar gvonsem avatar jkiesele avatar lgray avatar malgeri avatar predragm avatar riga avatar rovere avatar selvaggi avatar shahrukhqasim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reco-ntuples's Issues

Missing layer clusters (halo hits)

This is to illustrate the issue of sporadically missing layer clusters.

It was observed that good multiclusters can sometimes miss one (or several) layer clusters even close to the shower maximum.

  • This plot shows all rechits associated to any multicluster for a single 25 GeV photon. The color coding reflects the rechit_flag: 0 stands for core hits, while 2 are halo hits.
    It is clearly visible that in layer 9 most rechits are halo hits.
    image

  • This plot shows the same event as the previous but only for layer 9. There are only 2 core hits.
    image

  • In this plot the colour of the rechits corresponds to rechit energy. Most energy is deposited in the central hits.
    image

  • In this plot the colour of the rechits corresponds to the associated 2d cluster index. There are multiple 2d clusters, while only 1 has core hits.
    image

  • In this plot the colour of the rechits corresponds to the associated multi cluster index. The central hits (light blue colour) correspond to a multicluster with ~22 GeV, i.e. the one matching the photon. Since these hits are all halo hits, their energy (~6GeV) is also not taken into account in the total (~90GeV).
    image

  • In fact, the proper hits are assigned to two adjacent layer clusters. The color reflects the 2d cluster index.
    image
    The boxes show the cluster seeds, and they appear to be adjacent cells. This would mean, that both hits pass the delta_c cut, but at least one of them should have a delta of 1cm โ€“ the cell distance.

I'll try to find this behaviour in the cmssw execution and post an update later.

Installation recipe does not compile (CMSSW_10_2_0_pre1)

Hello
the recipe from the ReadMe does not work.

The compilation fails with:
CMSSW_10_2_0_pre1/src/RecoNtuples/HGCalAnalysis/plugins/HGCalAnalysis.cc:1767:42: error: 'class hgcal::RecHitTools' has no member named 'getPositionLayer'; did you mean 'getPosition'?
const GlobalPoint pos = recHitTools_.getPositionLayer(ilayer);
^~~~~~~~~~~~~~~~
I can take a look at it unless someone already knows how to fix it.
F.

Bad FastSimEvent dependency in reco-ntuple.

I'm not sure why we had to reuse FastSimulation tools inside the reco-ntuple.
Inspired by the problem Jeremy reported last Monday, I discovered that the FastSimEvent does implicitly make assumptions (hardcoded, of course) on the geometry that are indeed false and meaningless for the HGCAL case.
I'm, in particular, referring to these values that are required from here. This confuses the ntuplizer that will simply skip valid simTracks for no good reasons.
And no, of course they are not configurable.

change ntupliser from class-based content to flat ntuple

There are a few handicaps with the class-based ntuple:

  • cannot analyse ntuples without CMSSW/compiling and loading classes
  • whenever the classes change, older ntuples cannot be read anymore

I would like to change the ntuple such that it contains only flat branches (and vectors, we probably need vector<vector>). This would mean that everyone needs to change their analysis scripts, but I hope it's not too much work.

Let me know what you think by replying here. We'll probably introduce a branch in parallel to master so that people can have a look before changing the default if people agree.

reachedEE logic

I'm not sure I can follow the logic of reachedEE in this code snippet. To my limited understanding and debugging, the reachedEE==1 case is never a valid one.
Also the hardcoded 160 and 25 are something we should avoid as much as possible.

Descope ntupliser to dumping collections and links only

We will descope the ntupliser such that it will only dump collections and the links between them. In particular, we'll remove:

  • code that calculates additional variables such as the one for electrons
  • remove MultiClusters since they are currently not maintained

Eventually, the ntupliser will be moved to CMSSW.

need provenance info in ntuples

We have a lot of different ntuples available by now, but since things change quite frequently, we should add some provenance information such as

  • CMSSW release used
  • reco-ntuples and reco-prodtools git tags used
  • what else?

An alternative could be a simple txt file that contains this information and maybe the config files used.

Need to decide how to implement rechit-cluster association with sharing enabled

The ntupliser curently assumes that sharing is not enabled and therefore a 1-to-1 association between rechits and layer clusters exists: https://github.com/CMS-HGCAL/reco-ntuples/blob/master/HGCalAnalysis/plugins/HGCalAnalysis.cc#L1831-L1851

With sharing enabled, a rechit can be associated to more than one cluster. If we want to preserve this information in the ntuple, we would need to change rechit_cluster2d_ (the variable that stores the index of the associated layer cluster, https://github.com/CMS-HGCAL/reco-ntuples/blob/master/HGCalAnalysis/plugins/HGCalAnalysis.cc#L1917) to a vector of integers.

Any comments or alternative suggestions?

Extend PFCluster information

PFCluster, which are the realistic SimClusters for hadronic particles, need more information:

  • correctedEnergy()
  • hitsAndFractions()
    Anything else?

More information on the simulated track

I am currently working on the following:
Additions:
A1) extrapolation of each generated particle to the first HGCal layer (whether it decayed or not). Corresponding 4 branches (exx,exy,exeta,exphi) are added to the ntuple. Needed for DNN studies.
A2) origin vertex information (x,y,z), there used to be only the decay vertex information

Bugfixes:
B1) change all extrapolations to FullSim rather than FastSim. FastSim does not do a good job in the inhomogeneous field in the endcaps

Changes:
C1) Right now, each electron or particle that fulfils "myTrack.genpartIndex()>=0" gets fully propagated to the HGCal and it's decay vertex is set to the first HGCal layer, even if it decays in the meantime. This overlaps with the additions in A1 and was always affected by bug B1. Please let me know if this feature is used by anybody. Since the same information is now given by addition A1, I would propose to remove this feature.
For the DNN studies, I need the information on the real decay vertex, that I would otherwise need to add for these particular particles in new branches.

Add Electrons from MultiClusters in the ntuplizer

Dear all,
instead of writing a longish email to explain something that should be already clear from the subject, I just paste here the link to the comparison between my branch and the current HEAD with all the changes I propose to include.

Besides adding electrons (and their associated multiclusters decomposed in PFClusters), I also added a very simple python prototype to be used for quick analysis, with a rudimentary Zee example. Maybe you can find it useful.

https://github.com/CMS-HGCAL/reco-ntuples/compare/master...rovere:devel?w=1

RecHit collection seems incomplete

When matching hits via DetId between SimClusters (based on what's stored in hits and fractions) and RecHits, there are more hits associated to the SimCluster than there are in the RecHits collections. This is not the case for PFClusters. The ntupliser code should dump all RecHits there are, so there must be some thresholds applied to the RecHits before being saved to the respective subdetector collections. This has not been the case in the past, so this is not a bug in the ntupliser, but a possibly a (desired?) feature in CMSSW. Investigating in CMSSW...

Hardcoded total number of layers

52 is given hardcoded when using retrieveLayerPositions. There are no longer 52 layers in V10 (D41) but 50 and it would be good to read those values from CMSSW. This issue here is related [1].

[1] cms-sw/cmssw#26225

add GenParticle information

Add this information (4-vector , PDG ID + ?), but with a flag that is set to False (= do not write out) by default. What about filtering on status code?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.