Code Monkey home page Code Monkey logo

Comments (9)

henryiii avatar henryiii commented on August 19, 2024 1

The call hist.Sumw2(kTRUE) sets the variances equal to the entries. Then the error is the square root of the entries. Uproot is setting the variance equal to the square of the entries, making the error equal to the entries. I think it's a simple bug due to the fact "sumw2" has a square in it, so it looks like it should be set with a square.

If you have a bin with 5 entries, the usual thing to do is assume that it was filled 5 times, so sumw2 = 1² + 1² + 1² + 1² + 1² = 5. Then the error is √5. Usually you do not assume that this was filled once with weight 5 (which would be with error 5).

from uproot3-methods.

jpivarski avatar jpivarski commented on August 19, 2024 1

Sorry—just catching up. Yes: this sounds like a good solution, though I'll understand it better when I see it as a diff in a PR. Thanks!

from uproot3-methods.

jpivarski avatar jpivarski commented on August 19, 2024

The confusing thing about this (I believe this is what you're hitting) is that the TH1 is a NumPy array. That array corresponds to the values (bin contents) of the histogram. It might also have a _fSumw2 attribute attached to it, which is another NumPy array, that corresponds to the errors.

The issue is that ROOT's TH1 is a subclass of TArray, and in Uproot 3 I made TArray a NumPy array subclass. This confusion motivated a less direct modeling of C++ classes in Uproot 4, in which Python objects representing C++ objects contain their superclasses instead of inheriting from them.

Not only did this introduce the strange semantics in which the histogram values are not an attribute; they are the object, but it also made other subclasses of TH1, such as TProfile, have incorrect values and errors (because we didn't explicitly implement it, and it inherited TH1's methods).

At least, that's what I think you're running into above.

from uproot3-methods.

henryiii avatar henryiii commented on August 19, 2024

The problem is this line:

out._fSumw2 = valuesarray ** 2

Which sets fSumw2 assuming that the values in the ndarray were made with a single fill with a large weight, rather than out._fSumw2 = valuesarray, which would assume all fills were made with weight=1 and would match ROOT's behavior when adding weights. (I don't see an obvious path for if the ndarray already has an fSumw2 attribute attached, but didn't check closely).

from uproot3-methods.

matthewfeickert avatar matthewfeickert commented on August 19, 2024

The confusing thing about this (I believe this is what you're hitting) is that the TH1 is a NumPy array. That array corresponds to the values (bin contents) of the histogram. It might also have a _fSumw2 attribute attached to it, which is another NumPy array, that corresponds to the errors.

Hm, it seems like this will always happen though given that it is present here using only uproot3

# uproot3_only.py
import numpy as np
import uproot3

if __name__ == "__main__":
    bins = np.arange(0, 8)
    counts = np.array([2 ** x for x in range(len(bins[:-1]))])
    with uproot3.recreate("test.root", compression=uproot3.ZLIB(4)) as outfile:
        outfile["data"] = (counts, bins)

    root_file = uproot3.open("test.root")
    hist = root_file["data"]

    print(f"hist._fSumw2 ({type(hist._fSumw2)}): {hist._fSumw2}")
    print(f"hist values ({type(hist.values)}): {hist.values}")
    print(f"hist variances ({type(hist.variances)}): {hist.variances}")
$ python uproot3_only.py 
hist._fSumw2 (<class 'uproot3.rootio.TArrayD'>): [0.0, 1.0, 4.0, 16.0, 64.0, 256.0, 1024.0, 4096.0, 0.0]
hist values (<class 'numpy.ndarray'>): [ 1  2  4  8 16 32 64]
hist variances (<class 'numpy.ndarray'>): [1.000e+00 4.000e+00 1.600e+01 6.400e+01 2.560e+02 1.024e+03 4.096e+03]

So won't it always have _fSumw2 from

out._fSumw2 = valuesarray ** 2

and then the variances are set from it? :/ I may be missing something obvious, but from what Henry has pointed out it seems like this will always be the case if you're writing a histogram from numpy arrays.

from uproot3-methods.

kratsg avatar kratsg commented on August 19, 2024

@henryiii to be clear, is this just boiling down to whether one needs to do sum(values)**2 versus sum(x**2 for x in values) ?

from uproot3-methods.

matthewfeickert avatar matthewfeickert commented on August 19, 2024

I think I now understand that this is also highlighting a difference in how ROOT and uproot3 write files given that

# example_pyroot.py
import uproot
from ROOT import TH1F, TFile, kFALSE, kTRUE


def write_root_file(filename):
    n_bins = 7
    hist = TH1F("data", "", n_bins, 0, n_bins)
    hist.SetStats(kFALSE)

    for bin_idx in range(0, n_bins):
        hist.SetBinContent(bin_idx + 1, 2 ** bin_idx)

    hist.Sumw2(kTRUE)

    write_file = TFile(filename, "recreate")
    hist.Write()
    write_file.Close()


if __name__ == "__main__":
    write_root_file("pyroot_output.root")

    root_file = uproot.open("pyroot_output.root")
    hist = root_file["data"]

    print(f"hist has weights: {hist.weighted}")
    print(f"hist values: {hist.values()}")
    print(f"hist errors: {hist.errors()}")
    print(f"hist variances: {hist.variances()}")

gives

$ python example_pyroot.py 
hist has weights: True
hist values: [ 1.  2.  4.  8. 16. 32. 64.]
hist errors: [1.         1.41421356 2.         2.82842712 4.         5.65685425
 8.        ]
hist variances: [ 1.  2.  4.  8. 16. 32. 64.]

I need to think on this more, as I obviously don't understand the problem as well as everyone else, but would changing the behavior of TH1's from_numpy be enough and be correct?

from uproot3-methods.

henryiii avatar henryiii commented on August 19, 2024

I think the above answers @kratsg's question - yes, when it says "sumw2", it's each weight that's squared, not the sum of them, so setting this with valuesarray ** 2 is only correct if there was only one weight in the sum.

from uproot3-methods.

matthewfeickert avatar matthewfeickert commented on August 19, 2024

@jpivarski If you're happy with the solution Henry poses above I have a branch on uproot3-methods that implements this that I'll make a PR from.

from uproot3-methods.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.