Code Monkey home page Code Monkey logo

Comments (4)

ibab avatar ibab commented on June 11, 2024

This would be a useful PR.
Would be nice to just load several files at once instead of using hadd to concatenate them.

  1. Yeah, I agree. root_numpy might also check if the requested branches are in each of the files, so it might be that we don't have to do a lot here.
  2. Right, that bit needs to be changed.

from root_pandas.

alexpearce avatar alexpearce commented on June 11, 2024

You broke all the things so do this yourself:

diff --git a/root_pandas/__init__.py b/root_pandas/__init__.py
index 7345770..43f22f7 100644
--- a/root_pandas/__init__.py
+++ b/root_pandas/__init__.py
@@ -54,16 +54,16 @@ def get_matching_variables(branches, patterns, fail=True):
             raise ValueError("Pattern '{}' didn't match any branch".format(p))
     return selected

-def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
+def read_root(paths, tree_key=None, columns=None, ignore=None, chunksize=None, where=None, *kargs, **kwargs):
     """
-    Read a ROOT file into a pandas DataFrame.
+    Read a ROOT file, or list of ROOT files, into a pandas DataFrame.
     Further *kargs and *kwargs are passed to root_numpy's root2array.
     If the root file contains a branch called index, it will become the DataFrame's index.

     Parameters
     ----------
-    path: string
-        The path to the root file
+    paths: string or list
+        The path(s) to the root file(s)
     tree_key: string
         The key of the tree to load.
     columns: str or sequence of str
@@ -85,14 +85,19 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
         >>> df = read_root('test.root', 'MyTree', columns=['A{B,C}*', 'D'], where='ABB > 100')

     """
+    if not isinstance(paths, list):
+        paths = [paths]
+    # Use a single file to search for trees and branches
+    seed_path = paths[0]
+
     if not tree_key:
-        branches = list_trees(path)
+        branches = list_trees(seed_path)
         if len(branches) == 1:
             tree_key = branches[0]
         else:
-            raise ValueError('More than one tree found in {}'.format(path))
+            raise ValueError('More than one tree found in {}'.format(seed_path))

-    branches = list_branches(path, tree_key)
+    branches = list_branches(seed_path, tree_key)

     if not columns:
         all_vars = branches
@@ -117,16 +122,18 @@ def read_root(path, tree_key=None, columns=None, ignore=None, chunksize=None, wh
             all_vars.remove(var)

     if chunksize:
-        f = ROOT.TFile(path)
-        n_entries = f.Get(tree_key).GetEntries()
-        f.Close()
+        tchain = ROOT.TChain(tree_key)
+        for path in paths:
+            tchain.Add(path)
+        n_entries = tchain.GetEntries()
+        # TODO could explicitly clean up the opened TFiles with TChain::Reset
         def genchunks():
             for chunk in range(int(ceil(float(n_entries) / chunksize))):
-                arr = root2array(path, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
+                arr = root2array(paths, tree_key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *kargs, **kwargs)
                 yield convert_to_dataframe(arr)
         return genchunks()

-    arr = root2array(path, tree_key, all_vars, selection=where, *kargs, **kwargs)
+    arr = root2array(paths, tree_key, all_vars, selection=where, *kargs, **kwargs)
     return convert_to_dataframe(arr)

 def convert_to_dataframe(array):

P.S. you're not the coolest guy ever (anymore) 😎

from root_pandas.

ibab avatar ibab commented on June 11, 2024

Okay, I've applied the changes and committed under your name 👼

from root_pandas.

alexpearce avatar alexpearce commented on June 11, 2024

🙏

from root_pandas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.