Comments (4)
This would be a useful PR.
Would be nice to just load several files at once instead of using hadd
to concatenate them.
- Yeah, I agree.
root_numpy
might also check if the requested branches are in each of the files, so it might be that we don't have to do a lot here. - Right, that bit needs to be changed.
from root_pandas.
You broke all the things so do this yourself:
diff --git a/root_pandas/__init__.py b/root_pandas/__init__.py
index 7345770..43f22f7 100644
--- a/root_pandas/__init__.py
+++ b/root_pandas/__init__.py
@@ -54,16 +54,16 @@ def get_matching_variables(branches, patterns, fail=True):
raise ValueError("Pattern '{}' didn't match any branch".format(p))
return selected
-def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
+def read_root(paths, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
"""
- Read a ROOT file into a pandas DataFrame.
+ Read a ROOT file, or list of ROOT files, into a pandas DataFrame.
Further *kargs and *kwargs are passed to root_numpy's root2array.
If the root file contains a branch called index, it will become the DataFrame's index.
Parameters
----------
- path: string
- The path to the root file
+ paths: string or list
+ The path(s) to the root file(s)
tree_key: string
The key of the tree to load.
columns: str or sequence of str
@@ -85,14 +85,19 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
>>> df = read_root('test.root', 'MyTree', columns=['A{B,C}*', 'D'], where='ABB > 100')
"""
+ if not isinstance(paths, list):
+ paths = [paths]
+ # Use a single file to search for trees and branches
+ seed_path = paths[0]
+
if not tree_key:
- branches = list_trees(path)
+ branches = list_trees(seed_path)
if len(branches) == 1:
tree_key = branches[0]
else:
- raise ValueError('More than one tree found in {}'.format(path))
+ raise ValueError('More than one tree found in {}'.format(seed_path))
- branches = list_branches(path, tree_key)
+ branches = list_branches(seed_path, tree_key)
if not columns:
all_vars = branches
@@ -117,16 +122,18 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
all_vars.remove(var)
if chunksize:
- f = ROOT.TFile(path)
- n_entries = f.Get(tree_key).GetEntries()
- f.Close()
+ tchain = ROOT.TChain(tree_key)
+ for path in paths:
+ tchain.Add(path)
+ n_entries = tchain.GetEntries()
+ # TODO could explicitly clean up the opened TFiles with TChain::Reset
def genchunks():
for chunk in range(int(ceil(float(n_entries) / chunksize))):
- arr = root2array(path, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
+ arr = root2array(paths, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
yield convert_to_dataframe(arr)
return genchunks()
- arr = root2array(path, tree_key, all_vars, selection=where, *kargs, **kwargs)
+ arr = root2array(paths, tree_key, all_vars, selection=where, *kargs, **kwargs)
return convert_to_dataframe(arr)
def convert_to_dataframe(array):
P.S. you're not the coolest guy ever (anymore) 😎
from root_pandas.
Okay, I've applied the changes and committed under your name 👼
from root_pandas.
🙏
from root_pandas.
Related Issues (20)
- ROOT 6.12 HOT 4
- Installation Instructions? HOT 1
- Problem installing root_pandas with anaconda HOT 11
- get_matching_variables is slow with many variables, columns
- Error if first file of the list does not contain the tree HOT 1
- Move to a BSD-3 license HOT 4
- IndexError when read_root used with chunksize returns an empty iterator HOT 1
- Recursive brace expansion is broken HOT 1
- Installing root_pandas on lxplus HOT 5
- cannot import root_pandas: Symbol not found HOT 7
- Change error message in to_root when two columns have the same name HOT 1
- root_pandas randomly shuffles index of columns HOT 1
- read_root not compatible to new pandas version HOT 2
- Cannot install root_pandas on Google colab HOT 3
- to_root and read_root fail on macOS HOT 8
- Error while reading file from http://opendata.cern.ch/record/10 HOT 6
- Reading file with chunksize now gives iterator error HOT 2
- Segmentation fault with ROOT 6.22/02 HOT 1
- Installing with pip 21.0.1 does not work, as pip tries to use wheel but doesn't fall back on setup.py install HOT 7
- Parallel processing HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from root_pandas.