root_numpy.root2array and friends can <a href="https:

You broke all the things so do this yourself: <div class="highlight highlight-sour

Accept lists of files about root_pandas HOT 4 CLOSED

scikit-hep commented on June 11, 2024

Accept lists of files

from root_pandas.

Comments (4)

ibab commented on June 11, 2024

This would be a useful PR.
Would be nice to just load several files at once instead of using hadd to concatenate them.

Yeah, I agree. root_numpy might also check if the requested branches are in each of the files, so it might be that we don't have to do a lot here.
Right, that bit needs to be changed.

from root_pandas.

alexpearce commented on June 11, 2024

You broke all the things so do this yourself:

diff --git a/root_pandas/__init__.py b/root_pandas/__init__.py
index 7345770..43f22f7 100644
--- a/root_pandas/__init__.py
+++ b/root_pandas/__init__.py
@@ -54,16 +54,16 @@ def get_matching_variables(branches, patterns, fail=True):
             raise ValueError("Pattern '{}' didn't match any branch".format(p))
     return selected

-def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
+def read_root(paths, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
     """
-    Read a ROOT file into a pandas DataFrame.
+    Read a ROOT file, or list of ROOT files, into a pandas DataFrame.
     Further *kargs and *kwargs are passed to root_numpy's root2array.
     If the root file contains a branch called index, it will become the DataFrame's index.

     Parameters
     ----------
-    path: string
-        The path to the root file
+    paths: string or list
+        The path(s) to the root file(s)
     tree_key: string
         The key of the tree to load.
     columns: str or sequence of str
@@ -85,14 +85,19 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
         >>> df = read_root('test.root', 'MyTree', columns=['A{B,C}*', 'D'], where='ABB > 100')

     """
+    if not isinstance(paths, list):
+        paths = [paths]
+    # Use a single file to search for trees and branches
+    seed_path = paths[0]
+
     if not tree_key:
-        branches = list_trees(path)
+        branches = list_trees(seed_path)
         if len(branches) == 1:
             tree_key = branches[0]
         else:
-            raise ValueError('More than one tree found in {}'.format(path))
+            raise ValueError('More than one tree found in {}'.format(seed_path))

-    branches = list_branches(path, tree_key)
+    branches = list_branches(seed_path, tree_key)

     if not columns:
         all_vars = branches
@@ -117,16 +122,18 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
             all_vars.remove(var)

     if chunksize:
-        f = ROOT.TFile(path)
-        n_entries = f.Get(tree_key).GetEntries()
-        f.Close()
+        tchain = ROOT.TChain(tree_key)
+        for path in paths:
+            tchain.Add(path)
+        n_entries = tchain.GetEntries()
+        # TODO could explicitly clean up the opened TFiles with TChain::Reset
         def genchunks():
             for chunk in range(int(ceil(float(n_entries) / chunksize))):
-                arr = root2array(path, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
+                arr = root2array(paths, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
                 yield convert_to_dataframe(arr)
         return genchunks()

-    arr = root2array(path, tree_key, all_vars, selection=where, *kargs, **kwargs)
+    arr = root2array(paths, tree_key, all_vars, selection=where, *kargs, **kwargs)
     return convert_to_dataframe(arr)

 def convert_to_dataframe(array):

P.S. you're not the coolest guy ever (anymore) 😎

from root_pandas.

ibab commented on June 11, 2024

Okay, I've applied the changes and committed under your name 👼

from root_pandas.

alexpearce commented on June 11, 2024

🙏

from root_pandas.

Accept lists of files about root_pandas HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent