Comments (14)
However, there is a name clash. But, that's minor problem.
I look through your code but seems that it has a slightly different signatures and a subtle difference in limitation.
tree_to_recarray(trees, branches=None,
use_cache=False, cache_size=1000000,
include_weight=False,
weight_name='weight',
weight_dtype='f4')
vs
root2array(fnames,treename,branches=None)
The reason I chose this signature is that I have absolutely no idea how to get a C++ TTree from PyROOT TTree Object(so that I can avoid all python function calls). Do you happen to know how? So, the subtle difference is that root_numpy.root2array need a tree from a file. While tree_to_recarray can probably read in in memory TTree as well.
How would you like to include it in?
If you want to include it as it is. Just copy them over change setup.py, rename methods. But you will have 2 methods that serves very similar purpose, which is kind of odd. But I think people can probably live with it. Bridging them will require the knowledge of how to get C++ TTree from PyROOT TTree Object. The PyROOT header doesn't such a thing. Otherwise it just defeat the purpose of having a c extension.
I'm not even sure if it compiles under windows. My setup.py depends on a call to root-config. So, users may have trouble trying to install rootpy bacause of this extension.
One more thing numpy boolean string type is actually "bool". "b" means a byte or something else that I don't quite understand. This took me a good chunk of time to figure out. I know the document says "b" but that doesn't work.
from rootpy.
This is very cool, especially that it avoids pyroot altogether, which I
would imagine makes it much more portable. It's not clear to me yet what
parts of rootpy this could be useful for.
On Wed, Mar 28, 2012 at 6:47 AM, Noel Dawe <
[email protected]
wrote:
Check this out:
https://github.com/piti118/root_numpy
Reply to this email directly or view it on GitHub:
#13
from rootpy.
That's one of the issue as well.
The philosophy behind root_numpy is to avoid PyROOT altogether and have the user do everything in standard numpy/matplotib while being able to read data from root file.
I even patch numpy so that the column names of recarray is auto-complete in ipython. I found that my workflow is greatly improve with root_numpy/numpy/matplotlib/ipython notebook.
But having said that, feel free to take the code.
from rootpy.
@piti118 thanks again for this great package. The idea is that root_numpy could be used by tree_to_recarray
to convert a TTree into a numpy array and avoid all the overhead of looping in Python (currently making this method slow).
Yes, tree_to_recarray
currenty reads the tree in-memory. You might be able to pass a TTree (or Tree in the rootpy framework, which inherits from TTree) to the C extension with ROOT.AsCObject(t)
where t
is a TTree
which gives you a PyCObject
. The C extension can then cast it to a TTree
:
>>> from rootpy.io import File
>>> from rootpy.tree import Tree
>>> a = File('test.root', 'recreate')
>>> t = Tree()
>>> t
Tree('44e831cd3e7b4a228ddf279aa285b0d9')
>>> import ROOT
>>> ROOT.AsCObject(t)
<PyCObject object at 0x378b0a8>
No worries about API differences. These things can be sorted out later and rootpy's API isn't written in stone yet.
One of the major goals of rootpy is to provide a way to easily integrate ROOT within the vast ecosystem of scientific Python packages like numpy, scipy, matplotlib, pytables, scikit-learn, etc... and root_numpy would be a very nice improvement to rootpy's numpy interface. I fear that too many physicists are limiting themselves to ROOT and have no idea that any of these other very powerful frameworks exist.
Would you be interesting in maintaining root_numpy as a subpackage of rootpy? Or if you prefer to continue maintaining it as a separate package then we could simply add your git repository as a subrepository (submodule) of rootpy's repository.
from rootpy.
I'll try PyCObject.
In that case, I think maintaining it as a subrepository or rootpy organization is a better idea in case people have trouble compiling the c extension.
Then tree_to_recarray can, first find if it can import root_numpy(it will throw ImportError), if it can then call the faster c-extension otherwise calling the pure python code.
from rootpy.
Done check out the head:
root_numpy.pyroot2rec
and
root_numpy.pyroot2array
It's more subtle than I thought because PyCObject is deprecate in 2.7 and 3.1. PyCObject is removed in python 3.2. It's being replaced with capsule but capsule doesn't exists in python 2.6........... and root doesn't provide AsCapsule interface yet -_-". But, it's all good now with C directive.
See: http://docs.python.org/howto/cporting.html
from rootpy.
Thanks! I can try integrating root_numpy into rootpy tomorrow. Thanks again, that was fast.
from rootpy.
would also be nice if you could do some benchmark of pure python vs this one
from rootpy.
A huge improvement (see rootpy/benchmarks/tree/root2array):
./test.py
Using pure Python method...
4400080 function calls in 19.807 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 19.807 19.807 <string>:1(<module>)
800000 1.999 0.000 1.999 0.000 records.py:223(__getattribute__)
1 0.000 0.000 0.000 0.000 records.py:390(__new__)
1 0.000 0.000 0.000 0.000 records.py:407(__getattribute__)
800000 2.806 0.000 7.594 0.000 records.py:456(__getitem__)
1 5.339 5.339 19.807 19.807 root2array.py:37(tree_to_recarray)
100001 0.334 0.000 0.463 0.000 tree.py:1025(reset_collections)
1 0.000 0.000 0.000 0.000 tree.py:210(use_cache)
100001 1.757 0.000 2.220 0.000 tree.py:243(GetEntry)
100001 0.589 0.000 2.808 0.000 tree.py:248(__iter__)
1 0.000 0.000 0.000 0.000 tree.py:274(__setattr__)
1 0.000 0.000 0.000 0.000 tree.py:448(GetEntries)
1 0.000 0.000 0.000 0.000 tree.py:968(set_tree)
3 0.000 0.000 0.000 0.000 tree.py:980(__setattr__)
8 0.000 0.000 0.000 0.000 types.py:541(convert)
800000 2.491 0.000 4.066 0.000 types.py:64(value)
800000 1.575 0.000 1.575 0.000 types.py:85(__getitem__)
1 0.000 0.000 0.000 0.000 utils.py:14(asrootpy)
1 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x84e840}
800014 2.789 0.000 4.788 0.000 {isinstance}
8 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
6 0.000 0.000 0.000 0.000 {method 'has_key' of 'dict' objects}
8 0.000 0.000 0.000 0.000 {method 'index' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
100001 0.129 0.000 0.129 0.000 {method 'iterkeys' of 'dict' objects}
16 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
1 0.000 0.000 0.000 0.000 {sum}
time without profiler overhead:
8.278806 seconds
========================================
Using compiled C extension...
Warning: unknown root type: vector<float> skip
Warning: unknown root type: TLorentzVector skip
Warning: unknown root type: vector<float> skip
Warning: unknown root type: vector<float> skip
16 function calls in 0.459 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.459 0.459 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 ROOT.py:416(__getattr2)
1 0.000 0.000 0.455 0.455 __init__.py:44(pyroot2array)
1 0.000 0.000 0.455 0.455 __init__.py:66(pyroot2rec)
2 0.000 0.000 0.000 0.000 records.py:407(__getattribute__)
1 0.000 0.000 0.459 0.459 root2array.py:20(tree_to_recarray_c)
1 0.000 0.000 0.000 0.000 {dir}
2 0.000 0.000 0.000 0.000 {isinstance}
1 0.000 0.000 0.000 0.000 {libPyROOT.AsCObject}
1 0.000 0.000 0.000 0.000 {libPyROOT.LookupRootEntity}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}
1 0.004 0.004 0.004 0.004 {numpy.core.multiarray.concatenate}
1 0.455 0.455 0.455 0.455 {rootpy.root2array.root_numpy.src.croot_numpy.root2array_from_cobj}
time without profiler overhead:
Warning: unknown root type: vector<float> skip
Warning: unknown root type: TLorentzVector skip
Warning: unknown root type: vector<float> skip
Warning: unknown root type: vector<float> skip
0.453431 seconds
========================================
Comparison of output:
[ (8, 0, 52.7789192199707, 3.788414478302002, -0.9572163224220276, -1.6373909711837769, 2.4526565074920654, -22.792055130004883)
(1, 1, -10.197294235229492, -0.13024774193763733, 1.648690104484558, 0.7045357823371887, 1.670132040977478, 64.8749771118164)
(8, 2, 58.47273254394531, 1.918209195137024, 1.3651632070541382, -0.28253698348999023, 2.201446771621704, 16.73053550720215)
...,
(4, 99997, 110.29533386230469, -0.5117789506912231, -0.01048351638019085, 0.11074820905923843, -2.3150572776794434, -10.907495498657227)
(6, 99998, -33.58203125, -2.158933401107788, 1.2362463474273682, -0.6049940586090088, 4.583289623260498, 1.942242980003357)
(1, 99999, 22.445335388183594, 3.3832831382751465, -0.26965275406837463, -0.7595866322517395, -2.5078225135803223, 16.156869888305664)]
[ (8, 0, 52.7789192199707, 3.788414478302002, -0.9572163224220276, -1.6373909711837769, 2.4526565074920654, -22.792055130004883)
(1, 1, -10.197294235229492, -0.13024774193763733, 1.648690104484558, 0.7045357823371887, 1.670132040977478, 64.8749771118164)
(8, 2, 58.47273254394531, 1.918209195137024, 1.3651632070541382, -0.28253698348999023, 2.201446771621704, 16.73053550720215)
...,
(4, 99997, 110.29533386230469, -0.5117789506912231, -0.01048351638019085, 0.11074820905923843, -2.3150572776794434, -10.907495498657227)
(6, 99998, -33.58203125, -2.158933401107788, 1.2362463474273682, -0.6049940586090088, 4.583289623260498, 1.942242980003357)
(1, 99999, 22.445335388183594, 3.3832831382751465, -0.26965275406837463, -0.7595866322517395, -2.5078225135803223, 16.156869888305664)]
[-0.95721632 1.6486901 1.36516321 ..., -0.01048352 1.23624635
-0.26965275]
[-0.95721632 1.6486901 1.36516321 ..., -0.01048352 1.23624635
-0.26965275]
from rootpy.
Awesome
from rootpy.
root_numpy has been working very well! Already using it in my analysis...
Just some comments: It's fine that root_numpy skips branches that are not basic types if branches
is not specified but I think it should raise a TypeError if the user specifies a branch in branches
that is not of a basic type.
There also seems to be a problem if a branch in branches
does not exist (I got a segfault). In this case maybe raising a ValueError is best.
I made a few modifications on our rootpy branch of root_numpy. One is to allow empty trees. I think it's fine to return an empty array in this case.
from rootpy.
fix the segfault.(i believe) check the head
from rootpy.
The rationale for raising error on empty tree is because it's usally a typo for filename and I aim for this to be used in an interactive environment. May be I should raise something else like file not found execption-ish.
from rootpy.
Can this issue be closed and/or broken into new issues? @ndawe @piti118
from rootpy.
Related Issues (20)
- RootPy installation - ROOT cannot be imported? HOT 1
- Errors when installing with latest recommended ROOT HOT 2
- How to deal with roopy features that are adapted by PYROOT?
- Fill an STL vector of vectors HOT 3
- root errors turn always in python exceptions HOT 1
- Crash from import rootpy when HOME in not set.
- invalid escape sequence in BaseTree.DRAW_PATTERN
- rootpy.org no longer available HOT 4
- Error in unpickly root file with python3 HOT 8
- rootpy.io.file.walk gets slower the longer you "walk" HOT 1
- RDataFrame support?
- rootpy and ROOT 6.18 HOT 5
- Can't import rootpy in Python 3.8 HOT 5
- byteplay implementation incompatible with Python 3.8 opcode changes
- rootpy build failure with ROOT 6.20
- rootpy incompatible with the new PyROOT HOT 2
- Warning in <TClass::Init>: no dictionary for class RooSTLRefCountList<RooAbsArg> is available
- Change cache location? HOT 2
- Axis ticks hidden behind histogram despite sufficiently large zorder HOT 1
- Rootpy/Pyroot Python 3.8 for Windows 11 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rootpy.