Code Monkey home page Code Monkey logo

Comments (14)

piti118 avatar piti118 commented on May 20, 2024

However, there is a name clash. But, that's minor problem.
I look through your code but seems that it has a slightly different signatures and a subtle difference in limitation.

tree_to_recarray(trees, branches=None,
use_cache=False, cache_size=1000000,
include_weight=False,
weight_name='weight',
weight_dtype='f4')
vs

root2array(fnames,treename,branches=None)

The reason I chose this signature is that I have absolutely no idea how to get a C++ TTree from PyROOT TTree Object(so that I can avoid all python function calls). Do you happen to know how? So, the subtle difference is that root_numpy.root2array need a tree from a file. While tree_to_recarray can probably read in in memory TTree as well.

How would you like to include it in?

If you want to include it as it is. Just copy them over change setup.py, rename methods. But you will have 2 methods that serves very similar purpose, which is kind of odd. But I think people can probably live with it. Bridging them will require the knowledge of how to get C++ TTree from PyROOT TTree Object. The PyROOT header doesn't such a thing. Otherwise it just defeat the purpose of having a c extension.

I'm not even sure if it compiles under windows. My setup.py depends on a call to root-config. So, users may have trouble trying to install rootpy bacause of this extension.

One more thing numpy boolean string type is actually "bool". "b" means a byte or something else that I don't quite understand. This took me a good chunk of time to figure out. I know the document says "b" but that doesn't work.

from rootpy.

jklukas avatar jklukas commented on May 20, 2024

This is very cool, especially that it avoids pyroot altogether, which I
would imagine makes it much more portable. It's not clear to me yet what
parts of rootpy this could be useful for.

On Wed, Mar 28, 2012 at 6:47 AM, Noel Dawe <
[email protected]

wrote:

Check this out:

https://github.com/piti118/root_numpy


Reply to this email directly or view it on GitHub:
#13

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

That's one of the issue as well.

The philosophy behind root_numpy is to avoid PyROOT altogether and have the user do everything in standard numpy/matplotib while being able to read data from root file.

I even patch numpy so that the column names of recarray is auto-complete in ipython. I found that my workflow is greatly improve with root_numpy/numpy/matplotlib/ipython notebook.

But having said that, feel free to take the code.

from rootpy.

ndawe avatar ndawe commented on May 20, 2024

@piti118 thanks again for this great package. The idea is that root_numpy could be used by tree_to_recarray to convert a TTree into a numpy array and avoid all the overhead of looping in Python (currently making this method slow).

Yes, tree_to_recarray currenty reads the tree in-memory. You might be able to pass a TTree (or Tree in the rootpy framework, which inherits from TTree) to the C extension with ROOT.AsCObject(t) where t is a TTree which gives you a PyCObject. The C extension can then cast it to a TTree:

>>> from rootpy.io import File
>>> from rootpy.tree import Tree
>>> a = File('test.root', 'recreate')
>>> t = Tree()
>>> t
Tree('44e831cd3e7b4a228ddf279aa285b0d9')
>>> import ROOT
>>> ROOT.AsCObject(t)
<PyCObject object at 0x378b0a8>

No worries about API differences. These things can be sorted out later and rootpy's API isn't written in stone yet.

One of the major goals of rootpy is to provide a way to easily integrate ROOT within the vast ecosystem of scientific Python packages like numpy, scipy, matplotlib, pytables, scikit-learn, etc... and root_numpy would be a very nice improvement to rootpy's numpy interface. I fear that too many physicists are limiting themselves to ROOT and have no idea that any of these other very powerful frameworks exist.

Would you be interesting in maintaining root_numpy as a subpackage of rootpy? Or if you prefer to continue maintaining it as a separate package then we could simply add your git repository as a subrepository (submodule) of rootpy's repository.

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

I'll try PyCObject.

In that case, I think maintaining it as a subrepository or rootpy organization is a better idea in case people have trouble compiling the c extension.

Then tree_to_recarray can, first find if it can import root_numpy(it will throw ImportError), if it can then call the faster c-extension otherwise calling the pure python code.

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

Done check out the head:
root_numpy.pyroot2rec
and
root_numpy.pyroot2array

It's more subtle than I thought because PyCObject is deprecate in 2.7 and 3.1. PyCObject is removed in python 3.2. It's being replaced with capsule but capsule doesn't exists in python 2.6........... and root doesn't provide AsCapsule interface yet -_-". But, it's all good now with C directive.

See: http://docs.python.org/howto/cporting.html

from rootpy.

ndawe avatar ndawe commented on May 20, 2024

Thanks! I can try integrating root_numpy into rootpy tomorrow. Thanks again, that was fast.

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

would also be nice if you could do some benchmark of pure python vs this one

from rootpy.

ndawe avatar ndawe commented on May 20, 2024

A huge improvement (see rootpy/benchmarks/tree/root2array):

./test.py 
Using pure Python method...
         4400080 function calls in 19.807 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   19.807   19.807 <string>:1(<module>)
   800000    1.999    0.000    1.999    0.000 records.py:223(__getattribute__)
        1    0.000    0.000    0.000    0.000 records.py:390(__new__)
        1    0.000    0.000    0.000    0.000 records.py:407(__getattribute__)
   800000    2.806    0.000    7.594    0.000 records.py:456(__getitem__)
        1    5.339    5.339   19.807   19.807 root2array.py:37(tree_to_recarray)
   100001    0.334    0.000    0.463    0.000 tree.py:1025(reset_collections)
        1    0.000    0.000    0.000    0.000 tree.py:210(use_cache)
   100001    1.757    0.000    2.220    0.000 tree.py:243(GetEntry)
   100001    0.589    0.000    2.808    0.000 tree.py:248(__iter__)
        1    0.000    0.000    0.000    0.000 tree.py:274(__setattr__)
        1    0.000    0.000    0.000    0.000 tree.py:448(GetEntries)
        1    0.000    0.000    0.000    0.000 tree.py:968(set_tree)
        3    0.000    0.000    0.000    0.000 tree.py:980(__setattr__)
        8    0.000    0.000    0.000    0.000 types.py:541(convert)
   800000    2.491    0.000    4.066    0.000 types.py:64(value)
   800000    1.575    0.000    1.575    0.000 types.py:85(__getitem__)
        1    0.000    0.000    0.000    0.000 utils.py:14(asrootpy)
        1    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x84e840}
   800014    2.789    0.000    4.788    0.000 {isinstance}
        8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        6    0.000    0.000    0.000    0.000 {method 'has_key' of 'dict' objects}
        8    0.000    0.000    0.000    0.000 {method 'index' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
   100001    0.129    0.000    0.129    0.000 {method 'iterkeys' of 'dict' objects}
       16    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {sum}


time without profiler overhead:
8.278806 seconds
========================================
Using compiled C extension...
Warning: unknown root type: vector<float> skip 
Warning: unknown root type: TLorentzVector skip 
Warning: unknown root type: vector<float> skip 
Warning: unknown root type: vector<float> skip 
         16 function calls in 0.459 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.459    0.459 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 ROOT.py:416(__getattr2)
        1    0.000    0.000    0.455    0.455 __init__.py:44(pyroot2array)
        1    0.000    0.000    0.455    0.455 __init__.py:66(pyroot2rec)
        2    0.000    0.000    0.000    0.000 records.py:407(__getattribute__)
        1    0.000    0.000    0.459    0.459 root2array.py:20(tree_to_recarray_c)
        1    0.000    0.000    0.000    0.000 {dir}
        2    0.000    0.000    0.000    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {libPyROOT.AsCObject}
        1    0.000    0.000    0.000    0.000 {libPyROOT.LookupRootEntity}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'view' of 'numpy.ndarray' objects}
        1    0.004    0.004    0.004    0.004 {numpy.core.multiarray.concatenate}
        1    0.455    0.455    0.455    0.455 {rootpy.root2array.root_numpy.src.croot_numpy.root2array_from_cobj}


time without profiler overhead:
Warning: unknown root type: vector<float> skip 
Warning: unknown root type: TLorentzVector skip 
Warning: unknown root type: vector<float> skip 
Warning: unknown root type: vector<float> skip 
0.453431 seconds
========================================
Comparison of output:
[ (8, 0, 52.7789192199707, 3.788414478302002, -0.9572163224220276, -1.6373909711837769, 2.4526565074920654, -22.792055130004883)
 (1, 1, -10.197294235229492, -0.13024774193763733, 1.648690104484558, 0.7045357823371887, 1.670132040977478, 64.8749771118164)
 (8, 2, 58.47273254394531, 1.918209195137024, 1.3651632070541382, -0.28253698348999023, 2.201446771621704, 16.73053550720215)
 ...,
 (4, 99997, 110.29533386230469, -0.5117789506912231, -0.01048351638019085, 0.11074820905923843, -2.3150572776794434, -10.907495498657227)
 (6, 99998, -33.58203125, -2.158933401107788, 1.2362463474273682, -0.6049940586090088, 4.583289623260498, 1.942242980003357)
 (1, 99999, 22.445335388183594, 3.3832831382751465, -0.26965275406837463, -0.7595866322517395, -2.5078225135803223, 16.156869888305664)]
[ (8, 0, 52.7789192199707, 3.788414478302002, -0.9572163224220276, -1.6373909711837769, 2.4526565074920654, -22.792055130004883)
 (1, 1, -10.197294235229492, -0.13024774193763733, 1.648690104484558, 0.7045357823371887, 1.670132040977478, 64.8749771118164)
 (8, 2, 58.47273254394531, 1.918209195137024, 1.3651632070541382, -0.28253698348999023, 2.201446771621704, 16.73053550720215)
 ...,
 (4, 99997, 110.29533386230469, -0.5117789506912231, -0.01048351638019085, 0.11074820905923843, -2.3150572776794434, -10.907495498657227)
 (6, 99998, -33.58203125, -2.158933401107788, 1.2362463474273682, -0.6049940586090088, 4.583289623260498, 1.942242980003357)
 (1, 99999, 22.445335388183594, 3.3832831382751465, -0.26965275406837463, -0.7595866322517395, -2.5078225135803223, 16.156869888305664)]
[-0.95721632  1.6486901   1.36516321 ..., -0.01048352  1.23624635
 -0.26965275]
[-0.95721632  1.6486901   1.36516321 ..., -0.01048352  1.23624635
 -0.26965275]

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

Awesome

from rootpy.

ndawe avatar ndawe commented on May 20, 2024

root_numpy has been working very well! Already using it in my analysis...

Just some comments: It's fine that root_numpy skips branches that are not basic types if branches is not specified but I think it should raise a TypeError if the user specifies a branch in branches that is not of a basic type.

There also seems to be a problem if a branch in branches does not exist (I got a segfault). In this case maybe raising a ValueError is best.

I made a few modifications on our rootpy branch of root_numpy. One is to allow empty trees. I think it's fine to return an empty array in this case.

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

fix the segfault.(i believe) check the head

from rootpy.

piti118 avatar piti118 commented on May 20, 2024

The rationale for raising error on empty tree is because it's usally a typo for filename and I aim for this to be used in an interactive environment. May be I should raise something else like file not found execption-ish.

from rootpy.

pwaller avatar pwaller commented on May 20, 2024

Can this issue be closed and/or broken into new issues? @ndawe @piti118

from rootpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.