Code Monkey home page Code Monkey logo

Comments (17)

tamasgal avatar tamasgal commented on June 11, 2024 1

Ah I see:

julia> UnROOT.LazyTree(f, "podio_metadata", ["events___idTable"])
fID = -2   # <- added a @show here...
ERROR: BoundsError: attempt to access 2-element Vector{Any} at index [-1]
Stacktrace:
  [1] getindex(A::Vector{Any}, i1::Int64)
    @ Base ./essentials.jl:13
  [2] streamerfor(f::ROOTFile, branch::UnROOT.TBranchElement_10)
    @ UnROOT ~/Dev/UnROOT.jl/src/root.jl:161

Yes, that negative fID is weird. I have some notes on it but I have no solution yet.

EDIT: and yes, if you go to the deepest split level and there is an interpretation (like the one for vector<unsigned int>) you will not hit the logic with the fID

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024 1

In this case the UnROOT.streamerfor needs to figure out the parser logic from the actual streamer, which is there, but fails due to the lookup. The lookup in this case is not index based (on fID) but can be retrieved via the fName. (below I also printed the available streamers).

It all boils down to take the automatic parser generation into this level so that it works without using the split-branches.

julia> UnROOT.streamerfor(f, "podio::CollectionIDTable")
e.streamer.fName = "TObject"
e.streamer.fName = "TCollection"
e.streamer.fName = "podio::GenericParameters"
e.streamer.fName = "pair<string,vector<int> >"
e.streamer.fName = "pair<string,vector<float> >"
e.streamer.fName = "pair<string,vector<string> >"
e.streamer.fName = "pair<string,vector<double> >"
e.streamer.fName = "vector<int>"
e.streamer.fName = "vector<float>"
e.streamer.fName = "edm4hep::CaloHitContributionData"
e.streamer.fName = "edm4hep::Vector3f"
e.streamer.fName = "podio::ObjectID"
e.streamer.fName = "edm4hep::CalorimeterHitData"
e.streamer.fName = "edm4hep::ClusterData"
e.streamer.fName = "edm4hep::ParticleIDData"
e.streamer.fName = "edm4hep::SimCalorimeterHitData"
e.streamer.fName = "edm4hep::ReconstructedParticleData"
e.streamer.fName = "edm4hep::VertexData"
e.streamer.fName = "edm4hep::EventHeaderData"
e.streamer.fName = "edm4hep::SimTrackerHitData"
e.streamer.fName = "edm4hep::Vector3d"
e.streamer.fName = "edm4hep::MCRecoTrackerHitPlaneAssociationData"
e.streamer.fName = "edm4hep::TrackerHitPlaneData"
e.streamer.fName = "edm4hep::Vector2f"
e.streamer.fName = "edm4hep::ObjectID"
e.streamer.fName = "edm4hep::MCParticleData"
e.streamer.fName = "edm4hep::Vector2i"
e.streamer.fName = "edm4hep::RecoParticleVertexAssociationData"
e.streamer.fName = "edm4hep::MCRecoCaloAssociationData"
e.streamer.fName = "edm4hep::TrackData"
e.streamer.fName = "edm4hep::TrackState"
e.streamer.fName = "edm4hep::Quantity"
e.streamer.fName = "podio::CollectionIDTable"
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("podio::CollectionIDTable", "", 0xe9251d6f, 1, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "m_collectionIDs"
  fTitle: String ""
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<unsigned int>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 13
, UnROOT.TStreamerSTL
  version: UInt16 0x0004
  fOffset: Int64 0
  fName: String "m_names"
  fTitle: String ""
  fType: Int32 500
  fSize: Int32 24
  fArrayLength: Int32 0
  fArrayDim: Int32 0
  fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
  fTypeName: String "vector<string>"
  fXmin: Float64 0.0
  fXmax: Float64 0.0
  fFactor: Float64 0.0
  fSTLtype: Int32 1
  fCtype: Int32 61
])), Set{Any}())

I need to study what uproot is doing with the negative fID, since it's able to get this right:

>>> import uproot

>>> f = uproot.open("/Users/tamasgal/Downloads/Output_REC.root")

>>> f["podio_metadata/events___idTable"]
<TBranchElement 'events___idTable' (2 subbranches) at 0x00010b58eb20>

>>> f["podio_metadata/events___idTable"].array()
<Array [{m_collectionIDs: [...], ...}] type='1 * {m_collectionIDs: var * ui...'>

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024 1

Yes... I mean, obviously the information is sitting right in front of us ;) So in that case UnROOT should create the corresponding struct and add a readtype or whatever dynamically. That's what's missing.

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024 1

It's just a bit weird that this works fine in so many cases 😆 :

return next_streamer.streamer.fElements.elements[fID + 1] # one-based indexing in Julia

from unroot.jl.

Moelf avatar Moelf commented on June 11, 2024

@tamasgal this thing hits fID equals -2, I think we're missing something fundamental here

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

Actually the only missing thing in this case is the leaf type support for vector<unsigned int> (see #299). I should have added those, so you can blame me ;) The vector<string> stuff is already supported. You don't need a custom streamer.

With #299 the following works (without, you will fail reading the m_collectionIDs part:

julia> using UnROOT

julia> f = ROOTFile("/Users/tamasgal/Downloads/Output_REC.root")
ROOTFile with 3 entries and 51 streamers.
/Users/tamasgal/Downloads/Output_REC.root
├─ runs (TTree)
│  └─ "PARAMETERS"
├─ events (TTree)
│  ├─ "AllCaloHitContributionsCombined"
│  ├─ "_AllCaloHitContributionsCombined_particle"
│  ├─ "BeamCal_Hits"
│  ├─ ""
│  ├─ "YokeEndcapCollection"
│  ├─ "_YokeEndcapCollection_contributions"
│  └─ "PARAMETERS"
└─ podio_metadata (TTree)
   ├─ "events___idTable"
   ├─ "events___CollectionTypeInfo"
   ├─ "runs___idTable"
   ├─ "runs___CollectionTypeInfo"
   ├─ "PodioBuildVersion"
   └─ "EDMDefinitions"


julia> LazyBranch(f, "podio_metadata/events___idTable/m_names")
1-element LazyBranch{SubArray{String, 1, Vector{String}, Tuple{UnitRange{Int64}}, true}, UnROOT.Offsetjagg, ArraysOfArrays.VectorOfVectors{String, Vector{String}, Vector{Int32}, Vector{Tuple{}}}}: 
 ["AllCaloHitContributionsCombined", "EventHeader", "BeamCalClusters", "BeamCalClusters_particleIDs", "BeamCalCollection", "BeamCalRecoParticles", "BeamCalRecoParticles_particleIDs", "BeamCal_Hits", "BuildUpVertices", "BuildUpVertices_RP""TightSelectedPandoraPFOs", "InnerTrackerBarrelHitsRelations", "InnerTrackerEndcapHitsRelations", "OuterTrackerBarrelHitsRelations", "OuterTrackerEndcapHitsRelations", "RefinedVertexJets_rel", "RelationCaloHit", "RelationMuonHit", "VXDEndcapTrackerHitRelations", "VXDTrackerHitRelations"]

julia> LazyBranch(f, "podio_metadata/events___idTable/m_collectionIDs")
1-element LazyBranch{SubArray{UInt32, 1, Vector{UInt32}, Tuple{UnitRange{Int64}}, true}, UnROOT.Offsetjagg, ArraysOfArrays.VectorOfVectors{UInt32, Vector{UInt32}, Vector{Int32}, Vector{Tuple{}}}}: 
 UInt32[0x3a25675d, 0xd793ab91, 0xf0d073dd, 0x1d19206c, 0xc298a348, 0xc29370d2, 0x3954b563, 0xd2b19e7b, 0xfd03f5d0, 0x310a0f040x5fa7cf93, 0x029be193, 0x743732ae, 0xc42bbbee, 0xd1211017, 0x8dac6bb6, 0x603a5016, 0xdf24625a, 0xbb4cff22, 0x178c9330]

julia> LazyTree(f, "podio_metadata", [Regex("events___idTable/(.*)") => s"\1"])
 Row │ m_names                                                    m_collectionIDs                                ⋯     │ SubArray{String                                            SubArray{UInt32                                ⋯─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ ["AllCaloHitContributionsCombined", "EventHeader", "BeamC  [975529821, 3616779153, 4040192989, 488185964, ⋯                                                                                                  1 column omitted

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

Fixed in v0.10.21.

@peremato let me know if it works for you.

Btw. just a little bit of clarification: the custom parsing always applies to a branch and not a tree (or set of branches). It's usually needed when the split-level is low (so that one needs to deserialise compound structures) or if the type for a specific branch is simply not supported.

from unroot.jl.

Moelf avatar Moelf commented on June 11, 2024

huh, I don't know why this doesn't error due to fID== -2, maybe because custom struct logic doesn't hit that?

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

How did you get the fID == -2 bubble up? Sorry for my ignorance, I have not looked closely enough 😆

from unroot.jl.

Moelf avatar Moelf commented on June 11, 2024

yeah, from my very quick look, uproot does not do anything with fID explicitly

from unroot.jl.

peremato avatar peremato commented on June 11, 2024

Fixed in v0.10.21.

@peremato let me know if it works for you.

Btw. just a little bit of clarification: the custom parsing always applies to a branch and not a tree (or set of branches). It's usually needed when the split-level is low (so that one needs to deserialise compound structures) or if the type for a specific branch is simply not supported.

First, thanks very much @tamasgal. It works great once you know how to do it.

It is very confusing still for me the way to select the branches and leaves (perhaps is a lack of proper documentation or pre-knowledge of the ROOT file organisation). This works nicely:

ulia> meta = UnROOT.LazyTree(tfile, "podio_metadata", [Regex("events___idTable/(.*)") => s"\1"])
 Row │ m_names                                                                                                  m_collectionIDs                                ⋯
     │ SubArray{String                                                                                          SubArray{UInt32                                ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ ["AllCaloHitContributionsCombined", "EventHeader", "BeamCalClusters", "BeamCalClusters_particleIDs", "B  [975529821, 3616779153, 4040192989, 488185964, ⋯
                                                                                                                                                1 column omitted

but what I would do naively does not

julia> meta = UnROOT.LazyTree(tfile, "podio_metadata", ["m_names", "m_collectionIDs"])
ERROR: MethodError: no method matching LazyBranch(::ROOTFile, ::Missing)

Closest candidates are:
  LazyBranch(::ROOTFile, ::AbstractString)
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:134
  LazyBranch(::ROOTFile, ::Union{UnROOT.TBranch, UnROOT.TBranchElement})
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:116

Stacktrace:
 [1] LazyBranch(f::ROOTFile, s::String)
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:134
 [2] LazyTree(f::ROOTFile, tree::UnROOT.TTree, treepath::String, branches::Vector{String}; sink::Type{LazyTree})
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:450
 [3] LazyTree
   @ ~/Development/UnROOT.jl/src/iteration.jl:432 [inlined]
 [4] LazyTree(f::ROOTFile, s::String, branches::Vector{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:393
 [5] LazyTree(f::ROOTFile, s::String, branches::Vector{String})
   @ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:390
 [6] top-level scope
   @ REPL[6]:1

the flowing works but the names of the columns are wrong

julia> meta = UnROOT.LazyTree(tfile, "podio_metadata", ["events___idTable/m_names", "events___idTable/m_collectionIDs"])
 Row │ events___idTabl                                                                                          events___idTabl                                ⋯
     │ SubArray{UInt32                                                                                          SubArray{String                                ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ [975529821, 3616779153, 4040192989, 488185964, 3264783176, 3264442578, 961852771, 3534855803, 424489518  ["AllCaloHitContributionsCombined", "EventHead ⋯
                                                                                                                                                1 column omitted

I did also try the naming convention that was used for the other tree "events" with <branch>_<leaf> but also does not work. I see that for the LazyBranch the convention is <branch>/<leaf>. Overall is very confusing.

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

Yes, the problem is indeed that you need to know a little bit about the ROOT structure's subtleties. As you can see, uproot also requires you to point to events___idTable but then does the automatic RecArrat-creation from the sub-branches. This is of course something I'd like to have in UnROOT as well but it requires a lot of restructuring. As always, you learn ROOT iteratively and early design decisions need to be changed quite often (I had so many iterations in UnROOT already 😆 ).

I really hope that I will find a longer time slot (2-4 weeks) next year to spend a significant amount of time on refactoring UnROOT.

>>> import uproot

>>> f = uproot.open("/Users/tamasgal/Downloads/Output_REC.root")

>>> f["podio_metadata/events___idTable"]
<TBranchElement 'events___idTable' (2 subbranches) at 0x00010b58eb20>

>>> f["podio_metadata/events___idTable"].array()
<Array [{m_collectionIDs: [...], ...}] type='1 * {m_collectionIDs: var * ui...'>

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

Regarding the events tree, you do the same, but also here you need to provide the full path to the sub-branches:

julia> LazyTree(f, "events", [r"BeamCal_Hits/BeamCal_Hits.*\.(\w+)$" => s"\1"])
 Row │ time             x                energyError      energy           y   ⋯
     │ SubArray{Float3  SubArray{Float3  SubArray{Float3  SubArray{Float3  Sub ⋯
─────┼──────────────────────────────────────────────────────────────────────────
 1   │ []               []               []               []               []  ⋯
 2   │ []               []               []               []               []  ⋯
 3   │ []               []               []               []               []  ⋯
 4   │ []               []               []               []               []  ⋯
 5   │ []               []               []               []               []  ⋯
 6   │ []               []               []               []               []  ⋯
 7   │ []               []               []               []               []  ⋯
 8   │ []               []               []               []               []  ⋯
 9   │ []               []               []               []               []  ⋯
 10  │ []               []               []               []               []  ⋯
 11  │ []               []               []               []               []  ⋯
 12  │ []               []               []               []               []  ⋯
 13  │ [0.0, 0.0,       [-8.2, -8.       [0.0, 0.0,       [0.0267, 0       [6314  │ []               []               []               []               []  ⋯
 15  │ []               []               []               []               []  ⋯
 16  │ []               []               []               []               []  ⋯
 17  │ []               []               []               []               []  ⋯
 18  │ []               []               []               []               []  ⋯
 19  │ [0.0, 0.0]       [3.17, 3.2       [0.0, 0.0]       [0.0305, 0       [-120  │ []               []               []               []               []  ⋯
 21  │ []               []               []               []               []  ⋯
 22  │ [0.0, 0.0]       [151.0, 15       [0.0, 0.0]       [0.0128, 0       [-8 ⋯
  ⋮  │        ⋮                ⋮                ⋮                ⋮             ⋱
                                                    4 columns and 3 rows omitted

from unroot.jl.

peremato avatar peremato commented on June 11, 2024

I was not doing this. If I do

julia> events = LazyTree(f, "events", ["BeamCal_Hits"])
 Row │ BeamCal_Hits_en            BeamCal_Hits_ti            BeamCal_Hits_en            BeamCal_Hits_po            BeamCal_Hits_po            BeamCal_Hits_po  ⋯
     │ SubArray{Float3            SubArray{Float3            SubArray{Float3            SubArray{Float3            SubArray{Float3            SubArray{Float3  ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ []                         []                         []                         []                         []                         []               ⋯
 2   │ []                         []                         []                         []                         []                         []               ⋯
 3   │ []                         []                         []                         []                         []                         []               ⋯
 4   │ []                         []                         []                         []                         []                         []               ⋯
 5   │ []                         []                         []                         []                         []                         []               ⋯
 6   │ []                         []                         []                         []                         []                         []               ⋯
 7   │ []                         []                         []                         []                         []                         []               ⋯
 8   │ []                         []                         []                         []                         []                         []               ⋯
 9   │ []                         []                         []                         []                         []                         []               ⋯
 10  │ []                         []                         []                         []                         []                         []               ⋯
 11  │ []                         []                         []                         []                         []                         []               ⋯
 12  │ []                         []                         []                         []                         []                         []               ⋯
 13  │ [0.0, 0.0, 0.0, 0.0, 0.0,  [0.0, 0.0, 0.0, 0.0, 0.0,  [0.0267, 0.0214, 0.0853,   [3290.0, 3290.0, 3290.0,   [-8.2, -8.16, -1.92, 31.1  [63.1, 63.1, 66. ⋯
 14  │ []                         []                         []                         []                         []                         []               ⋯
 15  │ []                         []                         []                         []                         []                         []               ⋯
 16  │ []                         []                         []                         []                         []                         []               ⋯
 17  │ []                         []                         []                         []                         []                         []               ⋯
 18  │ []                         []                         []                         []                         []                         []               ⋯
 19  │ [0.0, 0.0]                 [0.0, 0.0]                 [0.0305, 0.0754]           [-3350.0, -3360.0]         [3.17, 3.21]               [-19.2, -19.2]   ⋯
 20  │ []                         []                         []                         []                         []                         []               ⋯
 21  │ []                         []                         []                         []                         []                         []               ⋯
 22  │ [0.0, 0.0]                 [0.0, 0.0]                 [0.0128, 0.00132]          [3360.0, 3380.0]           [151.0, 151.0]             [-86.8, -86.8]   ⋯
 23  │ [0.0]                      [0.0]                      [2.02f-6]                  [3390.0]                   [-62.9]                    [61.3]           ⋯

and the leaves get the name <branch>_<leaf>

ulia> names(events)
8-element Vector{String}:
 "BeamCal_Hits_energyError"
 "BeamCal_Hits_time"
 "BeamCal_Hits_energy"
 "BeamCal_Hits_position_z"
 "BeamCal_Hits_position_x"
 "BeamCal_Hits_position_y"
 "BeamCal_Hits_cellID"
 "BeamCal_Hits_type"

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

I mean, technically we can do this LazyTree creation on the fly automatically but I could not come up with a way which works reliably, especially with all those funny (read weird) namings and dot-madness. So eventually we need to ask the user to provide the regex to help UnROOT make reasonable fieldnames like x instead of BeamCal_Hits.position.x which would anyways not be valid due to the dots, so it needs to be translated to BeamCal_Hits_position_x or so, but notice here that BeamCal_Hits is redundant, since the branch is already called like that. ROOT however still stores that with that prefix. BUT not always and I still don't know why. We have some logic in UnROOT which works quite OK but it will still give you funny names in many cases. That's why I introduced that regex-thing, which I highly abuse 😉 see here:

https://github.com/KM3NeT/KM3io.jl/blob/65318a1265fd6bfa064b06a5c4721711160e50f1/src/root/offline.jl#L164-L193

Actually that is basically the place where we would need to incorporate the original streamer which tells you how to name them and how the hierarchy is structures, but it's quite complex and UnROOT then really would have to define those structs at runtime, which brings us to the...

...painful fact: if you let UnROOT define the structs, you will not be able to use those types in your own analysis code explicitly. Which means that of course Julia will happily pass you the instances, and your function will eat those types as well and everything is fine (and type-stable) but you will not be able to restrict or use those types to utilise multiple dispatch features since they are created on the fly and attached to the UnROOT namespace (that would technically be type piracy) and of course you will have to deal with dynamic dispatch all(?) the time.

That's why I kind of like the that we simply use LazyTree, which is a highly parametric type, signalling that it's a universal thing (like a named tuple) but it allows you to hide your data in some container type and/or reinterpret it to your own own types. So we force to use a barrier in order to be able to make use of a solid type system. That's what I have shown in KM3io jl Making UnROOT jl comfortable for KM3NeT - Tamas Gal

On the other hand, you can of course provide your custom structs and make UnROOT utilise those, so you have full control and maximum efficiency. That's also shown in the presentation above, but of course requires more understanding of the underlying structures.

I use both techniques with great performance.

from unroot.jl.

tamasgal avatar tamasgal commented on June 11, 2024

I was not doing this. If I do

Yes that works too, if you are fine with the UnROOT naming ;)

from unroot.jl.

peremato avatar peremato commented on June 11, 2024

Hi Tom. I agree we can do several things and hide the UnROOT level. I you want have a look at what I have been doing with EDM4hep.jl. I am mapping a simple Julia type (isbits) to a set of columns in the LazyTree within a StructArray in a recursive manner. This is very convenient and good performance for some use cases. There are some examples like ttbar_digits.jl to illustrate what you can do. I have given a presentation this week to the team developing this event model. It is very encouraging.

from unroot.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.