scikit-hep / root_pandas Goto Github PK

A Python module for conveniently loading/saving ROOT files as pandas DataFrames

License: MIT License

Python 96.97% Shell 3.03%

scikit-hep root-cern analysis hep

root_pandas's Introduction

⚠️root_pandas is deprecated and unmaintained⚠️

root_pandas is built upon root_numpy which has not been actively maintained in several years. This is mostly due to the emergence of new alternatives which are both faster and more flexible.

uproot provides support for reading and writing ROOT files without the need for an installation of ROOT. See here for details.
ROOT now natively supports converting objects into numpy arrays using either directly using TTree or with the newer RDataFrame.

root_pandas: conveniently loading/saving ROOT files as pandas DataFrames

root_pandas is a convenience package built around the root_numpy library. It allows you to easily load and store pandas DataFrames using the columnar ROOT data format used in high energy physics.

It's modeled closely after the existing pandas API for reading and writing HDF5 files. This means that in many cases, it is possible to substitute the use of HDF5 with ROOT and vice versa.

On top of that, root_pandas offers several features that go beyond what pandas offers with read_hdf and to_hdf.

These include

Specifying multiple input filenames, in which case they are read as if they were one continuous file.
Selecting several columns at once using * globbing and {A,B} shell patterns.
Flattening source files containing arrays by storing one array element each in the DataFrame, duplicating any scalar variables.

Python versions supported:

Reading ROOT files

This is how you can read the contents of a ROOT file into a DataFrame:

from root_pandas import read_root

df = read_root('myfile.root')

If there are several ROOT trees in the input file, you have to specify the tree key:

df = read_root('myfile.root', 'mykey')

You can also directly read multiple ROOT files at once by passing a list of file names:

df = read_root(['file1.root', 'file2.root'], 'mykey')

In this case, each file must have the same set of columns under the given key.

Specific columns can be selected like this:

df = read_root('myfile.root', columns=['variable1', 'variable2'])

You can also use * in the column names to read in any matching branch:

df = read_root('myfile.root', columns=['variable*'])

In addition, you can use shell brace patterns as in

df = read_root('myfile.root', columns=['variable{1,2}'])

You can also use * and {a,b} simultaneously, and several times per string.

If you want to transform your variables using a ROOT selection string, you have to put a noexpand: prefix in front of the column name that you want to use the selection string in:

df = read_root('myfile.root', columns=['noexpand:sqrt(variable1)']

Working with stored arrays can be a bit inconventient in pandas. root_pandas makes it easy to flatten your input data, providing you with a DataFrame containing only scalars:

df = read_root('myfile.root', columns=['arrayvariable', 'othervariable'], flatten=['arrayvariable'])

Assuming the ROOT file contains the array [1, 2, 3] in the first arrayvariable column, flattening will expand this into three entries, where each contains one of the array elements. All other scalar entries are duplicated. The automatically created __array_index column also allows you to get the index that each array element had in its array before flattening.

There is also support for working with files that don't fit into memory: If the chunksize parameter is specified, read_root returns an iterator that yields DataFrames, each containing up to chunksize rows.

for df in read_root('bigfile.root', chunksize=100000):
    # process df here

If bigfile.root doesn't contain an index, the default indices of the individual DataFrame chunks will still increase continuously, as if they were parts of a single large DataFrame.

You can also combine any of the above options at the same time.

Reading in chunks also supports progress bars

from progressbar import ProgressBar
pbar = ProgressBar()
for df in pbar(read_root('bigfile.root', chunksize=100000)):
    # process df here

# or
from tqdm import tqdm
for df in tqdm(read_root('bigfile.root', chunksize=100000), unit='chunks'):
    # process df here

Writing ROOT files

root_pandas patches the pandas DataFrame to have a to_root method that allows you to save it into a ROOT file:

df.to_root('out.root', key='mytree')

You can also call the to_root function and specify the DataFrame as the first argument:

to_root(df, 'out.root', key='mytree')

By default, to_root erases the existing contents of the file. Use mode='a' to append:

for df in read_root('bigfile.root', chunksize=100000):
    df.to_root('out.root', mode='a')

Warning: When using this feature to stream data from one ROOT file into another, you shouldn't forget to os.remove the output file first, otherwise you will append more and more data to it on each run of your program.

The DataFrame index

When reading a ROOT file, root_pandas will automatically add a pandas index to the DataFrame, which starts at 1 and counts up for each entry. When writing the DataFrame to a ROOT file, it stores the DataFrame index in a __index__ branch. Currently, only single-dimensional indices are supported.

root_pandas's People

Contributors

Stargazers

Watchers

root_pandas's Issues

ROOT 6.12

I am having trouble running this on root 6.12. When attempting to import root_pandas I get the following error

ImportError: dlopen(/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/root_numpy-4.7.3-py2.7-macosx-10.13-x86_64.egg/root_numpy/_librootnumpy.so, 2): Library not loaded: /opt/local/libexec/root6/lib/root/libCore.6.10.so
Referenced from: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/root_numpy-4.7.3-py2.7-macosx-10.13-x86_64.egg/root_numpy/_librootnumpy.so
Reason: image not found

Since I am on 6.12 I instead have a libCor.6.12.so file, but am not sure how to fix this.

Tag a new version

Is it time for root_pandas 0.1.2? I keep finding myself doing

$ pip install --user git+git://github.com/ibab/root_pandas@2001dcc8675d19fce8b15f02f63aa47944eec3d6

to pick up the latest fix.

Move to a BSD-3 license

I would welcome that we move to a BSD-3 license as for all Scikit-HEP projects (where possible) and most of the popular Python scientific ecosystem packages.
@ibab, would you be OK with that? I know @chrisburr is happy with it, from discussions in person.
Thanks.

Implement ROOTStore

Pandas features HDFStore, a class for easily inspecting and modifying HDF5 files.
Implementing a similar ROOTStore could alleviate some of the pain points that remain when working with root_pandas.

Examples where a ROOTStore could speed up working with ROOT files:

Which trees are in this file?
How many rows/columns do they contain?
Saving several trees at once
Adding a tree to an existing file
Deleting a tree

Recursive brace expansion is broken

Dear all,

After updating to 0.3.2 I have encountered the following issue, which was working before.
The problem is the following: Suppose I want to expand the expression 'var{A,B}{1,2}' then I would expect it to be expanded to ['varA1', 'varA2', 'varB1', 'varB2']. Previously this was working as expected, however, with the current version I get the following: ['varB}{1', 'varA', 'var2'].

I think I have tracked it down to the brace expansion and the changes introduced in d571619, specifically I think it is the changes to the regular expression that introduced this behavior.

After having had a look at the tests, I see that this is probably not a completely unknown issue. Switching back to the regex as it was before the commit mentioned above, fixes the recursive expansion, but breaks the "multiple expansions with braces in name" case.

I haven't found a solution that works for both cases yet, unfortunately.

read_root not compatible to new pandas version

As announced in older pandas versions the read_root method is not compatible anymore in new pandas versions like 0.25.
The announcement for example in 0.24.2 is:
/opt/miniconda/envs/root_forge_36_test/lib/python3.6/site-packages/root_pandas/readwrite.py:320: FutureWarning: '.reindex_axis' is deprecated and will be removed in a future version. Use '.reindex' instead. df = df.reindex_axis(columns, axis=1, copy=False)

The error in pandas version 0.25 is:
File "/opt/miniconda/envs/root_forge_36_test/lib/python3.6/site-packages/root_pandas/readwrite.py", line 278, in read_root return convert_to_dataframe(arr) File "/opt/miniconda/envs/root_forge_36_test/lib/python3.6/site-packages/root_pandas/readwrite.py", line 320, in convert_to_dataframe df = df.reindex_axis(columns, axis=1, copy=False) File "/opt/miniconda/envs/root_forge_36_test/lib/python3.6/site-packages/pandas/core/generic.py", line 5180, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'DataFrame' object has no attribute 'reindex_axis‘

Use uproot backend

The uproot package is a Python implementation of some parts of ROOT's I/O capibilities. Interestingly for root_pandas, this includes TTree reading.

We should consider adding uproot as a 'backend' for loading TTrees from ROOT files, in addition to root_numpy. uproot doesn't handle writing at the moment, but I think that's OK.

We might also consider making uproot the default, installing it when installing root_pandas, and falling back to root_numpy when the user tries to write a file (and then complaining if ROOT isn't found).

Install root_pandas with miniconda on CERN lxplus: version `CXXABI_1.3.8' not found

Hello everyone,
I hope I am asking this in the correct place.
I am trying to setup an environment using miniconda (conda 4.3.30) on CERN's lxplus to work with root_pandas.

This is what I do:
$ conda create --name=test-plain-pandas
$ source activate test-plain-pandas
$ conda install root_pandas
$ conda activate test

The installation goes fine. The list of packages installed is the following:

However, once the installation is done, if I try to run root:

(test-plain-pandas) [fadesse@lxplus069 ewp-Bd2Kstee-AngAna]$ root -b -q

I get the following error:

root: /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.8' not found (required by /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/./libicui18n.so.58)
root: /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.9' not found (required by /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/./libicui18n.so.58)
root: /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.8' not found (required by /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/./libicuuc.so.58)
root: /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.9' not found (required by /afs/cern.ch/work/f/fadesse/miniconda3/envs/test-plain-pandas/bin/../lib/./libicuuc.so.58)

Any idea what I am doing wrong ?

Thank you for you help !

installing from anaconda

Dear sirs,
I'm trying to install root_pandas in my anaconda environment; so I tried this command from anaconda.org repository:

conda install -c nlesc root_pandas=python3

from https://anaconda.org/nlesc/root_pandas
Unfortunately, it fails:
............

Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata .........
Solving package specifications: ....

The following specifications were found to be in conflict:

python 2.7*
root_pandas python3*
Use "conda info " to see the dependencies for each package.
............

I have python 3.5.1 installed, so I though that using conda install -c nlesc root_pandas=python3 I would do the right action. But the conflict is about python2.7, so I do not know what to do.

I'm using a Centos machine
Linux x86_64 x86_64 GNU/Linux
where I installed ROOT 6.09/01
running anaconda
Python 3.5.1 |Anaconda 4.1.0 (64-bit)| (default, Jun 15 2016, 15:32:45)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
with pandas.version '0.18.1'

Could you please help me?
Many thanks.
F. Ruffini

selection keyword in note?

Is this a not implemented feature?

>>> df = read_root('test.root', 'MyTree', variables=['x_*', 'y_*'], selection='x_1 > 100')

.gitignore / cache files pushed

In the latest commits, a .pyc file got pushed.

You should add the standard python gitignore.
https://github.com/github/gitignore/blob/master/Python.gitignore

Error if first file of the list does not contain the tree

Sometimes one has many files that are supposed to contain the tree one is interested in but, due for example to a tight selection, some may not contain the tree. It is nice however to be able go blindly give the list of files without worrying which ones contain the tree and which ones do not.

If the file with the missing tree is not the first one that can be achieved in read_root by the option warn_missing_tree=True that is passed to root_numpy.root2array. Instead, if the first file miss the tree there is an error because of https://github.com/scikit-hep/root_pandas/blob/master/root_pandas/readwrite.py#L163 (seed_path = paths[0])

Bring API more in line with pandas

root_pandas should be brought more in line with the pandas HDF5 API, so users can easily switch between the two. This would also require #3 .

Cannot open file ending with ?svcClass=lhcbUser

I'm pretty sure this is going to result in a root bug (https://root.cern.ch/phpBB3/viewtopic.php?f=3&t=22427), but I cannot open files using xrootd which end in "?svcClass=lhcbUser". For instance in ipython:

In [7]: f = ROOT.TFile.Open('root://clhcbdlf.ads.rl.ac.uk//castor/ads.rl.ac.uk/prod/lhcb/user/a/adavis/2016_11/144593/144593984/DVTuples.root?svcClass=lhcbUser')

In [8]: tr = f.Get('Bs2DKMuNu/DecayTree')

works just fine, but

dfbs = root_pandas.read_root("root://clhcbdlf.ads.rl.ac.uk//castor/ads.rl.ac.uk/prod/lhcb/user/a/adavis/2016_11/144593/144593984/DVTuples.root?svcClass=lhcbUser",'Bs2DKMuNu/DecayTree',["Bs_M"])

results in

---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-6-21e630a2388a> in <module>()
----> 1 dfbs = root_pandas.read_root("root://clhcbdlf.ads.rl.ac.uk//castor/ads.rl.ac.uk/prod/lhcb/user/a/adavis/2016_11/144593/144593984/DVTuples.root?svcClass=lhcbUser",'Bs2DKMuNu/DecayTree',["Bs_M"])

/afs/cern.ch/user/a/adavis/.local/lib/python2.7/site-packages/root_pandas/readwrite.pyc in read_root(paths, key, columns, ignore, chunksize, where, flatten, *args, **kwargs)
    164         return genchunks()
    165 
--> 166     arr = root2array(paths, key, all_vars, selection=where, *args, **kwargs)
    167     if flatten:
    168         arr = do_flatten(arr)

/afs/cern.ch/user/a/adavis/.local/lib/python2.7/site-packages/root_numpy/_tree.pyc in root2array(filenames, treename, branches, selection, start, stop, step, include_weight, weight_name, cache_size, warn_missing_tree)
    207         weight_name,
    208         cache_size,
--> 209         warn_missing_tree)
    210 
    211     if flatten:

root_numpy/src/tree.pyx in _librootnumpy.root2array_fromfile (root_numpy/src/_librootnumpy.cpp:479)()

IOError: unable to access tree 'Bs2DKMuNu/DecayTree' in root://clhcbdlf.ads.rl.ac.uk//castor/ads.rl.ac.uk/prod/lhcb/user/a/adavis/2016_11/144593/144593984/DVTuples.root?svcClass=lhcbUser

Any thoughts?

get_matching_variables is slow with many variables, columns

When there are many branches and many branches to ignore get_matching_variables takes O(n²) time. This can probably be optimised by detecting patterns in advance and removing them from the double loop.

specifying multi-index during read_root

This is an awesome tool!

Considering that HEP data typically has variable number of objects and has a natural division of data (an event) it would be nice if this could be taken into account when converting a root file. I think this would be addressed (as suggested in the title) by being able to specify a multi-index when calling read_root. So for instance, I have event-by-event data in a root file and each event has a several vectors which are consistently sized within an event. I would like to specify something like:

df = read_root('my_data.root', columns=['track_pt', 'track_eta', 'track_phi'], index=['event', '__array_index'], flatten=True)

I was able to do this in two steps with just track_pt

In [26]: df = read_root('data/mydata.root', columns = ['event', 'track_pt'], flatten = True)
In [27]: df.index = [df.event, df.__array_index]                                                                                                                        
In [28]: df[:10]                                                                               
                     event   track_pt  __array_index

event __array_index              
3701  0               3701   2.806184              0
      1               3701   2.099216              1
      2               3701   1.563220              2
      3               3701  11.620861              3
      4               3701  -1.000000              4
      5               3701   0.338156              5
      6               3701  -2.725569              6
      7               3701  -0.955589              7
      8               3701   2.592065              8
      9               3701   1.000000              9

At some level this is a quality of life request, but this did not work when I specified additional variables.
Additionally it would be nice if you could somehow read by number of events instead of chunk size though maybe that's tricky to implement.

Thanks,
Nate

Problem installing root_pandas with anaconda

Hi everyone,
I'm trying to set up a conda environment with ROOT and root_pandas installed using the latest miniconda on my laptop.
I didthe installation of ROOT in a fresh environment with success but when I'm trying to install root_pandas using pip install root_pandas --no-binary root_numpy, I have the following error:

Did you already encounter such problem? Or does one of you have a workable setup of latest anaconda with root_pandas ?

Thank you very much for your help !

Installation Instructions?

Hi there. I'm just trying to install this package but am not exactly sure how... I have python 2.7 (or python 3.5) and ROOT 5.34.36. Am I supposed to run setup.py with some certain options?

Thanks,
Michael

read_root with large number of input files scales badly

Hi,

I am using read_root to get dataframes which contain chunks (100000 rows) of my data from a large TChain (multiple thousand files). Thereby I noticed that the function scales badly with the number of files that is passed to it for creating the TChain. I used cprofile to test this hypothesis - attached are two screenshots of the output. The one does refer to using my total data sample while the other uses only 5% of my data but with equal chunksize. You can see that the time per call of genchunks is around 8 times as high as the reference 5% sample.

I am able to bypass this behaviour by creating chunks of files before passing them to read_root
and was wondering if this is a bug in root_pandas or root2array.

Cheers!

root_pandas randomly shuffles index of columns

I recently realized that when constructing DataFrame from root_pandas.read_root, the index of the columns get randomly shuffled. Try the following:

wget http://scikit-hep.org/uproot/examples/HZZ.root

here is the test.py code:

#!/usr/bin/env python
import uproot
import root_pandas as rp
variables = ['MET_px', 'MET_py', 'EventWeight']
df=rp.read_root('HZZ.root', 'events', columns=variables)
events = uproot.open("HZZ.root")["events"]
df2=events.pandas.df(variables, flatten=False)
print(df.values[0])
print(df2.values[0])

So if you run this test.py code multiple times, you will see that the print out result from root_pandas DataFrama (df) changes; but the DataFrame from uproot (df2) is always the same (and follows the order of TBranch name lists).

zhicaiz@zhicaiz ~$ python test.py
[2.5636332  5.912771   0.00927101]
[5.912771   2.5636332  0.00927101]
zhicaiz@zhicaiz ~$ python test.py
[0.00927101 2.5636332  5.912771  ]
[5.912771   2.5636332  0.00927101]

root_pandas version i used: v0.6.1

Provide flattening option to flat into columns

In our use case of root files, we need it more often, that an array is flattened into columns, not rows.

E. g. we often save coordinates as array into the root files. For these it makes more sense to flatten into columns:
cog[3] → cog_0, cog_1, cog_2

Pipi sticker doesn't link to pipi

Clicking on the pipi sticker in the Readme, I get a 404.

In fact I cannot find root-pandas on pipi any more:
https://pypi.python.org/pypi?%3Aaction=search&term=root-pandas&submit=search

Require user to specify branches he wants to flatten

The optional argument flatten=True causes root_pandas to flatten out array-like branches in the root file by increasing the number of events in the tuple.

This can lead to unexpected side effects if more than one branch of the TTree contains arrays.

It also fails and causes an error if there are two branches with arrays of unequal length or if one branch contains tensors of dimension two or higher.

The proposal is to only flatten the branches which are specified. The remaining non-scalar branches will then be dropped just as in the case when flatten=False.

Include but don't flatten vectors from root branches

Is it possible to read a branch with entries with a vectors, but store them as entries which are arrays in the dataframe rather then duplicating rows and anding the __array_index column?

Cannot install root_pandas on Google colab

Dear all,
after running the cell below, I get the following output:
!pip install --user root_pandas

Collecting root_pandas
Using cached https://files.pythonhosted.org/packages/d4/5e/9f2d6e1c904bf0015e1fdf6718de88f8554083fdd8e112f3abad980351c6/root_pandas-0.7.0.tar.gz
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from root_pandas) (1.16.4)
Requirement already satisfied: pandas>=0.18.0 in /usr/local/lib/python3.6/dist-packages (from root_pandas) (0.24.2)
Collecting root_numpy (from root_pandas)
Using cached https://files.pythonhosted.org/packages/d5/5f/82f5111c22599676eb8b5f9b1bf85c38dcc7995d52cd6b4a8f5f5caa4659/root_numpy-4.8.0.tar.gz
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Neither root_numpy or root_pandas get installed, all other libraries seem to work fine.

Thanks,
remkamal

IndexError when read_root used with chunksize returns an empty iterator

Dear all,

I have been facing the following issue. Suppose you have a bunch of large root files, so you want to use chunksize. But sometimes, you want to apply a tight cut, such that for some files, you end up with no entries. In this case,

for df in read_root(myfile, key=myTree, where=tight_selection, chunksize=100000):
     # Do something

raises an IndexError: index 0 is out of bounds for axis 0 with size 0 because the iterator returned by read_root has length zero.

I'm not sure what is the best to change. I guess this is the part of read_root that has to be changed:

    if chunksize:
        tchain = ROOT.TChain(key)
        for path in paths:
            tchain.Add(path)
        n_entries = tchain.GetEntries()
        # XXX could explicitly clean up the opened TFiles with TChain::Reset

        def genchunks():
            current_index = 0
            for chunk in range(int(ceil(float(n_entries) / chunksize))):
                arr = root2array(paths, key, all_vars, start=chunk * chunksize, stop=(chunk+1) * chunksize, selection=where, *args, **kwargs)
                if flatten:
                    arr = do_flatten(arr, flatten)
                yield convert_to_dataframe(arr, start_index=current_index)
                current_index += len(arr)
return genchunks()

I guess if n_entries == 0, one should do something special, but I'm not sure what's the best to do. Maybe return None ? In that case the user can do:

df _ list = read_root(myfile, key=myTree, where=tight_selection, chunksize=100000)

if ( df_list != None ):
    for df in :
         # Do something

expose list_branches

Usually when opening a ROOT file I don't know the names of the branches, so it would be good to expose a way to list the names of the branches, without importing other modules. Same for the name of the trees.

different TTree name for different file

I have got different ROOT files. In each one I have a TTree (actually I have two TTrees per file, but I need only one). The names of the TTree are not the same for every ROOT file (actually it is equal to the name of the ROOT file).

Is it possible to have an interface like:

read_root({'file1.root': 'tree1', 'file2.root': 'tree2', ...})

read_root(({'path': 'file1.root', 'key':'tree1'},
           {'path': 'file2.root', 'key':'tree2'}))

Put the package on pypi

Just upload it :)

Makes installing easier.

Ignored the following non-scalar branches

I am trying to read a ttree with branches filled with vectors but it seems root_pandas is skipping them:

a = root_pandas.read_root("my_treeroot")
/home/nick/.local/lib/python3.6/site-packages/root_pandas/readwrite.py:221: UserWarning: Ignored the following non-scalar branches: <<branches names>>
  .format(bad_names=", ".join(nonscalar_columns)), UserWarning)

Isn't root_pandas suppose to read non scalar branches?

Saving to a ROOT file with "/" in tree name

In normal ROOT, trees may be within a TDirectory. This is transparent during reading (e.g., one can do f->Get("DirectoryName/TreeName" in C++).

However, when saving a DataFrame one cannot generate this directory structure. When saving a DataFrame with key="DirectoryName/TreeName", this will just become the name of the tree. This object cannot be read back, because f->Get("DirectoryName/TreeName" will look for a tree with the name TreeName within the TDirectory DirectoryName, instead of a tree with that full name.

Can't use formula as branch

root_numpy supports using formula as branch names. For example:

>>> root_numpy.root2array('f.root', 'tree', branches=['sqrt(mass)'], stop=3)
array([(739.8708446958261,), (1225.6622271912993,), (1096.497904427192,)],
      dtype=[('sqrt(mass)', '<f8')])

root_pandas doesn't support this, because it pattern matches branch names, failing if a name in the branches argument doesn't match a TBranch in the TTree.

Enhancement: Remove ROOT as a dependency.

Seriously :)

This should be a long-term goal because it would:
... make installing root_pandas so much easier.
... mean that a lot of users will not need to install ROOT at all.

I am aware that this is ambitious.

Since we depend on ROOT via root_numpy, the simlest way to go would be to remove the ROOT dependency in root_numpy. Alternatively, one could define a stripped-down version of root_numpy which contains only the parts that we actually depend on and then try to remove its dependency.

It is true, I have no idea how much work that is....

https://root.cern.ch/root/InputOutput.html

Problem with saving DataFrames to root files when they contain dtype object

Hi,

i have several textfiles with strings and floats in them. I can load them into a panda dataframe nicely.
I also need to save them in root format and wanted to use this library.

However, when i call the to_root function on my dataframe i got the following:

UserWarning: converter for dtype('O') is not implemented skipping

And indeed, if I load the rootfile later with read_root later the columns with strings in them miss.

I then tried this:

>>> for c in df.columns:
... if df[c].dtype == object:
... df[c] = df[c].astype(str)

Now i did not get any error message using to_root. However, when i load the root file later to a pandas dataframe it still misses the columns where strings are supposed to be.

How to fix this?
Thank you very much

Jupyter crash/Python quit unexpectedly

Jupyter/python crashes after:
df = read_root('Data/File.root')

[I 19:00:30.737 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 5e893716-ea03-432d-840f-c6f7d2e1d050 restarted

Pandas, numpy and ROOT seem to import and work well.

Worked before with ROOT built with 2.7 and MAC system 2.7 Python.
Now built ROOT using Python3 and using Anaconda and crashes every time.
Also tried with Python installed with brew.

Python conflict? Or does not work with Python 3 ROOT? Out of ideas.

cannot import root_pandas: Symbol not found

Today, I did pip install root_pandas --upgrade successfully from 0.3.x to 0.6.x. However, I cannot use it:

Python 2.7.14 |Anaconda custom (64-bit)| (default, Oct  5 2017, 02:28:52) 
Type "copyright", "credits" or "license" for more information.

IPython 5.4.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import root_pandas
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-2445605c5f03> in <module>()
----> 1 import root_pandas

/Users/michael/anaconda2/lib/python2.7/site-packages/root_pandas/__init__.py in <module>()
----> 1 from .readwrite import read_root
      2 from .readwrite import to_root
      3 from .version import __version__
      4 
      5 __all__ = [

/Users/michael/anaconda2/lib/python2.7/site-packages/root_pandas/readwrite.py in <module>()
      8 from pandas import DataFrame, RangeIndex
      9 import pandas as pd
---> 10 from root_numpy import root2array, list_trees
     11 import fnmatch
     12 from root_numpy import list_branches

/Users/michael/anaconda2/lib/python2.7/site-packages/root_numpy/__init__.py in <module>()
     49     del numpy_version_at_install
     50 
---> 51 from ._tree import (
     52     root2array, root2rec,
     53     tree2array, tree2rec,

/Applications/root_build/lib/ROOT.pyc in _importhook(name, *args, **kwds)
    461       except Exception:
    462          pass
--> 463    return _orig_ihook( name, *args, **kwds )
    464 
    465 __builtin__.__import__ = _importhook

/Users/michael/anaconda2/lib/python2.7/site-packages/root_numpy/_tree.py in <module>()
      4 
      5 from .extern.six import string_types
----> 6 from . import _librootnumpy
      7 
      8 

/Applications/root_build/lib/ROOT.pyc in _importhook(name, *args, **kwds)
    461       except Exception:
    462          pass
--> 463    return _orig_ihook( name, *args, **kwds )
    464 
    465 __builtin__.__import__ = _importhook

ImportError: dlopen(/Users/michael/anaconda2/lib/python2.7/site-packages/root_numpy/_librootnumpy.so, 2): Symbol not found: __ZNK5TFile12GetCacheReadEP7TObject
  Referenced from: /Users/michael/anaconda2/lib/python2.7/site-packages/root_numpy/_librootnumpy.so
  Expected in: /Applications/root_build/lib/libRIO.so
 in /Users/michael/anaconda2/lib/python2.7/site-packages/root_numpy/_librootnumpy.so

I'm using ROOT 6.17/01 and conda 4.5.11 on macOS Mojave.

write strings

I guess this is connected with #38 but I am using a quite newer version of pandas (0.20.3)

df = pandas.DataFrame({"x": [10, 20, 30], "y": ['ten', 'twenty', 'thirty']})
print df.dtypes

# x     int64
# y    object
# dtype: object

df.to_root('t.root')
f = ROOT.TFile.Open("t.root")   
f.Get("default").Scan()       

# ************************************
# *    Row   *         x * __index__ *
# ************************************
# *        0 *        10 *         0 *
# *        1 *        20 *         1 *
# *        2 *        30 *         2 *
# ************************************

why the string column is not written?

Implement unit tests

Would be nice to add unit tests before working on the ROOTStore and other improvements.

DataFrame chunks will have duplicate indices

If a ROOT file is loaded in chunks, the individual DataFrames will have the same starting value for the index. If you then save these chunks to a single file (with mode='a') and then load from that file in to a single DataFrame, its index will have duplicate values.

import numpy as np
import root_numpy
import root_pandas

# Create the file with root_numpy directly, so the input doesn't have an index
xs = np.array(np.vstack(np.random.normal(0, 1, 100)), dtype=[('x', float)])
root_numpy.array2root(xs, 'input.root', 'tree', mode='recreate')

# Read the file in chunks and then write to the output
# Use write-mode for the first chunk, then use append mode, to make sure the output is re-created
for idx, df in enumerate(root_pandas.read_root('input.root', chunksize=10)):
    if idx == 0:
        mode = 'w'
    else:
        mode = 'a'
    df.to_root('output.root', mode=mode)

df = root_pandas.read_root('output.root')
dup_mask = df.index.duplicated()
print(dup_mask.any(), df[dup_mask].index.size)

prints True, 90.

These duplicate values are problematic when performing certain operations.

One work-around is to set the index values by hand in the loop.

chunksize = 10
for idx, df in enumerate(root_pandas.read_root('input.root',
                                               chunksize=chunksize)):
    if idx == 0:
        writemode = 'w'
    else:
        writemode = 'a'
    # Offset the index of this chunk
    df.index += idx*chunksize
    df.to_root('output.root', mode=writemode)

Should root_pandas do this for us? It surprised me when I first saw it. But if you're only manipulating the chunks, i.e. not saving them to the same file, you won't encounter this problem, and maybe that's the more common use case of chunksize.

Flattening two branches produces only one `__array_index`

I have a tree with DecayTreeFitter and when I retrieve two array variables and use the flatten=True option with the read_root function, the resulting dataframe only contains a single __array_index column, so it's not possible to tell which index belongs to which column.

Side note: Addressing #28 would fix this, I may have a look into implementing it.

Accessing vectors from a ROOT file

Hi,

my ROOT file has second degree arrays.
So every particle (currently row in DataFrame) has an array of values for different timeslices (currently array inside the row). Is it possible to access this information with root_pandas?
Now elements from the second degree arrays are all mixed together inside the row array.

Edit. so 1D array inside a 1D array

Create some proper documentation

There should be some documentation before this gets released to PyPI

Add PyPI sticker so people know that the package is on PyPI

segmentation violation when reading 1GB tuple

Using

df_new = read_root(signal_file_name, signal_tree_name)

i get a segmentation violation with the following stack trace:

*** Break *** segmentation violation
===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================

Thread 11 (Thread 0x7f5b27a8f700 (LWP 40581)):
#0  0x000000334540d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00007f5b32b22af8 in PyThread_acquire_lock (lock=0x7f5b20013df0, waitflag=<value optimized out>) at Python/thread_pthread.h:324
#2  0x00007f5b32b28344 in lock_PyThread_acquire_lock (self=0x7f5b171e4df0, args=<value optimized out>) at ./Modules/threadmodule.c:52
#3  0x00007f5b32aef153 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#4  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#5  0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2bf54330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x7f5b27ec63b8, kwcount=0, defs=0x7f5b2bf602a8, defcount=1, closure=0x0) at Python/ceval.c:3265
#6  0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#7  call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#8  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#9  0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2bf54db0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b27ebeec8, kwcount=0, defs=0x7f5b2bf60468, defcount=1, closure=0x0) at Python/ceval.c:3265
#10 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#11 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#12 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#13 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28b885b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b329a5068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#14 0x00007f5b32a6e958 in function_call (func=0x7f5b28b9e2a8, arg=0x7f5b3269f950, kw=0x7f5b27ee55c8) at Objects/funcobject.c:526
#15 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b28b9e2a8, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#16 0x00007f5b32aed865 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4346
#17 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2718
#18 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28be1730, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x7f5b27ec4c68, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#19 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#20 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#21 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#22 0x00007f5b32aef600 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4119
#23 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#24 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#25 0x00007f5b32aef600 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4119
#26 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#27 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#28 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2bf59330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#29 0x00007f5b32a6e851 in function_call (func=0x7f5b2bf64668, arg=0x7f5b3269f850, kw=0x0) at Objects/funcobject.c:526
#30 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b2bf64668, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#31 0x00007f5b32a5195f in instancemethod_call (func=0x7f5b2bf64668, arg=0x7f5b3269f850, kw=0x0) at Objects/classobject.c:2602
#32 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b28c15410, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#33 0x00007f5b32ae8823 in PyEval_CallObjectWithKeywords (func=0x7f5b28c15410, arg=0x7f5b329a5050, kw=<value optimized out>) at Python/ceval.c:3902
#34 0x00007f5b32b28832 in t_bootstrap (boot_raw=<value optimized out>) at ./Modules/threadmodule.c:614
#35 0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#36 0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f5b1d906700 (LWP 40587)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f5b1cf05700 (LWP 40590)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f5b1c504700 (LWP 40591)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f5b1bb03700 (LWP 40592)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f5b1b102700 (LWP 40593)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f5b1a701700 (LWP 40594)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f5b19d00700 (LWP 40596)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5b192ff700 (LWP 40598)):
#0  0x000000334540b5bc in pthread_cond_wait

GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f5b1dc85648 in th_worker (tidptr=<value optimized out>) at numexpr/module.cpp:57
#2  0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f5b0eee1700 (LWP 40616)):
#0  0x000000334540d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00007f5b32b22af8 in PyThread_acquire_lock (lock=0x1a8cda0, waitflag=<value optimized out>) at Python/thread_pthread.h:324
#2  0x00007f5b32ae9474 in PyEval_RestoreThread (tstate=0x3ba9790) at Python/ceval.c:357
#3  0x00007f5b2c18745f in floatsleep (self=<value optimized out>, args=<value optimized out>) at -------src-dir-------/Python-2.7.9/Modules/timemodule.c:959
#4  time_sleep (self=<value optimized out>, args=<value optimized out>) at -------src-dir-------/Python-2.7.9/Modules/timemodule.c:206
#5  0x00007f5b32aef153 in call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4033
#6  PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#7  0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b16d29e30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b329a5068, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#8  0x00007f5b32a6e958 in function_call (func=0x7f5b11ab47d0, arg=0x7f5b18826050, kw=0x7f5b119bc050) at Objects/funcobject.c:526
#9  0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b11ab47d0, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#10 0x00007f5b32aed865 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4346
#11 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2718
#12 0x00007f5b32aef600 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4119
#13 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#14 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#15 0x00007f5b32aef600 in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4119
#16 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#17 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#18 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2bf59330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#19 0x00007f5b32a6e851 in function_call (func=0x7f5b2bf64668, arg=0x7f5b188260d0, kw=0x0) at Objects/funcobject.c:526
#20 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b2bf64668, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#21 0x00007f5b32a5195f in instancemethod_call (func=0x7f5b2bf64668, arg=0x7f5b188260d0, kw=0x0) at Objects/classobject.c:2602
#22 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b17dd6eb0, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#23 0x00007f5b32ae8823 in PyEval_CallObjectWithKeywords (func=0x7f5b17dd6eb0, arg=0x7f5b329a5050, kw=<value optimized out>) at Python/ceval.c:3902
#24 0x00007f5b32b28832 in t_bootstrap (boot_raw=<value optimized out>) at ./Modules/threadmodule.c:614
#25 0x00000033454079d1 in start_thread () from /lib64/libpthread.so.0
#26 0x00000033450e88fd in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f5b329e6700 (LWP 40578)):
#0  0x00000033450ac65d in waitpid () from /lib64/libc.so.6
#1  0x000000334503e609 in do_system () from /lib64/libc.so.6
#2  0x000000334503e940 in system () from /lib64/libc.so.6
#3  0x00007f5b162f40cf in TUnixSystem::StackTrace() () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.28/x86_64-slc6-gcc48-opt/root/lib/libCore.so
#4  0x00007f5b162f5c3c in TUnixSystem::DispatchSignals(ESignals) () from /afs/cern.ch/sw/lcg/app/releases/ROOT/5.34.28/x86_64-slc6-gcc48-opt/root/lib/libCore.so
#5  <signal handler called>
#6  0x00007f5b26465148 in PyArray_Item_XDECREF (data=0x7f5aac493010 "\345{\230\345\225w\250?kυ\t\317\024\v
", descr=0x7f5b0e48cce8) at numpy/core/src/multiarray/refcount.c:71
#7  PyArray_Item_XDECREF (data=0x7f5aac493010 "\345{\230\345\225w\250?kυ\t\317\024\v
", descr=0x7f5b0e48cce8) at numpy/core/src/multiarray/refcount.c:87
#8  0x00007f5b264653cf in PyArray_XDECREF (mp=0x7f5b0e48ead0) at numpy/core/src/multiarray/refcount.c:173
#9  0x00007f5b2637bc84 in array_dealloc (self=0x7f5b0e48ead0) at numpy/core/src/multiarray/arrayobject.c:417
#10 0x00007f5b32a8410b in dict_dealloc (mp=0x7f5b07ff9910) at Objects/dictobject.c:1010
#11 0x00007f5b32a6d6b7 in frame_dealloc (f=0x3bd5d60) at Objects/frameobject.c:471
#12 0x00007f5b32b187bb in tb_dealloc (tb=0x7f5b11a156c8) at Python/traceback.c:28
#13 0x00007f5b32b187cb in tb_dealloc (tb=0x7f5b11a15bd8) at Python/traceback.c:27
#14 0x00007f5b32b187cb in tb_dealloc (tb=0x7f5b11a155f0) at Python/traceback.c:27
#15 0x00007f5b32a82047 in insertdict_by_entry (mp=0x7f5b269dc050, key=0x7f5b329abaf8, hash=<value optimized out>, ep=<value optimized out>, value=<value optimized out>) at Objects/dictobject.c:519
#16 0x00007f5b32a8549c in insertdict (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:556
#17 dict_set_item_by_hash_or_entry (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:765
#18 PyDict_SetItem (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:818
#19 0x00007f5b32a89d65 in _PyObject_GenericSetAttrWithDict (obj=<value optimized out>, name=0x7f5b329abaf8, value=0x7f5b11a337e8, dict=0x7f5b269dc050) at Objects/object.c:1529
#20 0x00007f5b32a8a257 in PyObject_SetAttr (v=0x7f5b29386e10, name=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/object.c:1252
#21 0x00007f5b32aebf16 in PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2013
#22 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b29417c30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=4, kws=0x654bcfa0, kwcount=1, defs=0x7f5b2940c908, defcount=5, closure=0x0) at Python/ceval.c:3265
#23 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#24 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#25 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#26 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bbe6b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x65676ec8, kwcount=0, defs=0x7f5b28666640, defcount=4, closure=0x0) at Python/ceval.c:3265
#27 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#28 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#29 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#30 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bce1b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3, kws=0x3bd35f0, kwcount=0, defs=0x7f5b2866eca8, defcount=1, closure=0x0) at Python/ceval.c:3265
#31 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#32 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#33 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#34 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bce0b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3, kws=0x3bd2420, kwcount=3, defs=0x7f5b28ba3888, defcount=3, closure=0x0) at Python/ceval.c:3265
#35 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#36 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#37 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#38 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bc5f30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x7f5b269b8410, kwcount=1, defs=0x7f5b28ba3838, defcount=3, closure=0x0) at Python/ceval.c:3265
#39 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#40 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#41 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#42 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2867a330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x1c4c360, kwcount=1, defs=0x7f5b28ba7e28, defcount=1, closure=0x0) at Python/ceval.c:3265
#43 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#44 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#45 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#46 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2867a230, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b2819a7c0, kwcount=0, defs=0x7f5b28ba7de8, defcount=1, closure=0x0) at Python/ceval.c:3265
#47 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#48 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#49 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#50 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28685c30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b2843cea0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#51 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#52 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#53 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#54 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2c79f330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b32670848, kwcount=1, defs=0x7f5b29404068, defcount=1, closure=0x0) at Python/ceval.c:3265
#55 0x00007f5b32a6e958 in function_call (func=0x7f5b29403848, arg=0x7f5b3267f310, kw=0x7f5b28176b40) at Objects/funcobject.c:526
#56 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b29403848, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#57 0x00007f5b32aed865 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4346
#58 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2718
#59 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b32689130, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x7f5b329ce388, kwcount=0, defs=0x7f5b3269ffa8, defcount=1, closure=0x0) at Python/ceval.c:3265
#60 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#61 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#62 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#63 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b328ddc30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#64 0x00007f5b32af0d82 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>) at Python/ceval.c:667
#65 0x00007f5b32b10ba0 in run_mod (fp=0x18ce560, filename=<value optimized out>, start=<value optimized out>, globals=0x7f5b3297a168, locals=0x7f5b3297a168, closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:1371
#66 PyRun_FileExFlags (fp=0x18ce560, filename=<value optimized out>, start=<value optimized out>, globals=0x7f5b3297a168, locals=0x7f5b3297a168, closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:1357
#67 0x00007f5b32b10d7f in PyRun_SimpleFileExFlags (fp=0x18ce560, filename=0x7fff552d97ac "/home/rniet/anaconda/bin/ipython", closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:949
#68 0x00007f5b32b26664 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:645
#69 0x000000334501ed5d in __libc_start_main () from /lib64/libc.so.6
#70 0x0000000000400649 in _start ()
===========================================================


The lines below might hint at the cause of the crash.
If they do not help you then please submit a bug report at
http://root.cern.ch/bugs. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f5b26465148 in PyArray_Item_XDECREF (data=0x7f5aac493010 "345{230345225w250?kυt317024v
", descr=0x7f5b0e48cce8) at numpy/core/src/multiarray/refcount.c:71
#7  PyArray_Item_XDECREF (data=0x7f5aac493010 "345{230345225w250?kυt317024v
", descr=0x7f5b0e48cce8) at numpy/core/src/multiarray/refcount.c:87
#8  0x00007f5b264653cf in PyArray_XDECREF (mp=0x7f5b0e48ead0) at numpy/core/src/multiarray/refcount.c:173
#9  0x00007f5b2637bc84 in array_dealloc (self=0x7f5b0e48ead0) at numpy/core/src/multiarray/arrayobject.c:417
#10 0x00007f5b32a8410b in dict_dealloc (mp=0x7f5b07ff9910) at Objects/dictobject.c:1010
#11 0x00007f5b32a6d6b7 in frame_dealloc (f=0x3bd5d60) at Objects/frameobject.c:471
#12 0x00007f5b32b187bb in tb_dealloc (tb=0x7f5b11a156c8) at Python/traceback.c:28
#13 0x00007f5b32b187cb in tb_dealloc (tb=0x7f5b11a15bd8) at Python/traceback.c:27
#14 0x00007f5b32b187cb in tb_dealloc (tb=0x7f5b11a155f0) at Python/traceback.c:27
#15 0x00007f5b32a82047 in insertdict_by_entry (mp=0x7f5b269dc050, key=0x7f5b329abaf8, hash=<value optimized out>, ep=<value optimized out>, value=<value optimized out>) at Objects/dictobject.c:519
#16 0x00007f5b32a8549c in insertdict (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:556
#17 dict_set_item_by_hash_or_entry (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:765
#18 PyDict_SetItem (op=0x7f5b269dc050, key=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/dictobject.c:818
#19 0x00007f5b32a89d65 in _PyObject_GenericSetAttrWithDict (obj=<value optimized out>, name=0x7f5b329abaf8, value=0x7f5b11a337e8, dict=0x7f5b269dc050) at Objects/object.c:1529
#20 0x00007f5b32a8a257 in PyObject_SetAttr (v=0x7f5b29386e10, name=0x7f5b329abaf8, value=0x7f5b11a337e8) at Objects/object.c:1252
#21 0x00007f5b32aebf16 in PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2013
#22 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b29417c30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=4, kws=0x654bcfa0, kwcount=1, defs=0x7f5b2940c908, defcount=5, closure=0x0) at Python/ceval.c:3265
#23 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#24 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#25 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#26 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bbe6b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x65676ec8, kwcount=0, defs=0x7f5b28666640, defcount=4, closure=0x0) at Python/ceval.c:3265
#27 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#28 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#29 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#30 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bce1b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3, kws=0x3bd35f0, kwcount=0, defs=0x7f5b2866eca8, defcount=1, closure=0x0) at Python/ceval.c:3265
#31 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#32 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#33 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#34 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bce0b0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3, kws=0x3bd2420, kwcount=3, defs=0x7f5b28ba3888, defcount=3, closure=0x0) at Python/ceval.c:3265
#35 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#36 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#37 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#38 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28bc5f30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2, kws=0x7f5b269b8410, kwcount=1, defs=0x7f5b28ba3838, defcount=3, closure=0x0) at Python/ceval.c:3265
#39 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#40 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#41 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#42 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2867a330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x1c4c360, kwcount=1, defs=0x7f5b28ba7e28, defcount=1, closure=0x0) at Python/ceval.c:3265
#43 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#44 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#45 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#46 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2867a230, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b2819a7c0, kwcount=0, defs=0x7f5b28ba7de8, defcount=1, closure=0x0) at Python/ceval.c:3265
#47 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#48 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#49 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#50 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b28685c30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b2843cea0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#51 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#52 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#53 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#54 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b2c79f330, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=1, kws=0x7f5b32670848, kwcount=1, defs=0x7f5b29404068, defcount=1, closure=0x0) at Python/ceval.c:3265
#55 0x00007f5b32a6e958 in function_call (func=0x7f5b29403848, arg=0x7f5b3267f310, kw=0x7f5b28176b40) at Objects/funcobject.c:526
#56 0x00007f5b32a3f323 in PyObject_Call (func=0x7f5b29403848, arg=<value optimized out>, kw=<value optimized out>) at Objects/abstract.c:2529
#57 0x00007f5b32aed865 in ext_do_call (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4346
#58 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2718
#59 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b32689130, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x7f5b329ce388, kwcount=0, defs=0x7f5b3269ffa8, defcount=1, closure=0x0) at Python/ceval.c:3265
#60 0x00007f5b32aef2aa in fast_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4129
#61 call_function (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:4054
#62 PyEval_EvalFrameEx (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:2679
#63 0x00007f5b32af0c6e in PyEval_EvalCodeEx (co=0x7f5b328ddc30, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3265
#64 0x00007f5b32af0d82 in PyEval_EvalCode (co=<value optimized out>, globals=<value optimized out>, locals=<value optimized out>) at Python/ceval.c:667
#65 0x00007f5b32b10ba0 in run_mod (fp=0x18ce560, filename=<value optimized out>, start=<value optimized out>, globals=0x7f5b3297a168, locals=0x7f5b3297a168, closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:1371
#66 PyRun_FileExFlags (fp=0x18ce560, filename=<value optimized out>, start=<value optimized out>, globals=0x7f5b3297a168, locals=0x7f5b3297a168, closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:1357
#67 0x00007f5b32b10d7f in PyRun_SimpleFileExFlags (fp=0x18ce560, filename=0x7fff552d97ac "/home/rniet/anaconda/bin/ipython", closeit=1, flags=0x7fff552d80d0) at Python/pythonrun.c:949
#68 0x00007f5b32b26664 in Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:645
#69 0x000000334501ed5d in __libc_start_main () from /lib64/libc.so.6
#70 0x0000000000400649 in _start ()
===========================================================

root_pandas fails to read in root file with index

If I include the index into the root file with a script like this:

import root_pandas as rpd
import numpy as np

data = rpd.DataFrame()
data["x"] = np.random.normal(0, 1, 1000)
data["y"] = np.random.normal(0, 1, 1000)

data.to_root("test.root", "test")

df = rpd.read_root("test.root", "test")
df.plot(x="x", y="y", kind="scatter")

It throws an error that it cannot convert from long64:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    df = rpd.read_root("test.root", "test")
  File "/home/max_noethe/.local/lib/python2.7/site-packages/root_pandas/__init__.py", line 128, in read_root
    arr = root2array(path, tree_key, all_vars, selection=where, *kargs, **kwargs)
  File "/home/max_noethe/.local/lib/python2.7/site-packages/root_numpy/_tree.py", line 199, in root2array
    weight_name)
  File "tree.pyx", line 708, in _librootnumpy.root2array_fromFname (root_numpy/src/_librootnumpy.cpp:708)
  File "tree.pyx", line 623, in _librootnumpy.tree2array (root_numpy/src/_librootnumpy.cpp:623)
TypeError: cannot convert leaf index of branch index with type Long64_t (skipping)

Installing root_pandas on lxplus

In prevision of the upcoming lhcb starterkit, I tried to install root_pandas on lxplus using the instructions on the starterkit webpage.

I get the following error.

(test-venv2) [fadesse@lxplus029 ~]$ python -m pip install --user --no-binary root_numpy root_pandas
Collecting root_pandas
  Using cached https://files.pythonhosted.org/packages/de/c1/ab626834bf8821c3acc7051ded702d2eb8edac85115d8177f39e9324797f/root_pandas-0.6.0.tar.gz
Collecting numpy (from root_pandas)
  Using cached https://files.pythonhosted.org/packages/14/1c/546724245c8b3aad39d807a0bed14a37b39943860c6b34456a363076c65b/numpy-1.15.2-cp34-cp34m-manylinux1_x86_64.whl
Collecting pandas>=0.18.0 (from root_pandas)
  Using cached https://files.pythonhosted.org/packages/08/01/803834bc8a4e708aedebb133095a88a4dad9f45bbaf5ad777d2bea543c7e/pandas-0.22.0.tar.gz
  Installing build dependencies ... done
Collecting root_numpy (from root_pandas)
  Using cached https://files.pythonhosted.org/packages/d7/44/4165539b7a62de78e56cd7520fd79d19a07ab6fc7b6ccb581044eee5aca1/root_numpy-4.7.3.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/tmp/fadesse/pip-install-kpvpn5p6/root-numpy/setup.py", line 59, in <module>
        root_version = root_version_installed(root_config)
      File "<string>", line 104, in root_version_installed
      File "/usr/lib64/python3.4/subprocess.py", line 856, in __init__
        restore_signals, start_new_session)
      File "/usr/lib64/python3.4/subprocess.py", line 1464, in _execute_child
        raise child_exception_type(errno_num, err_msg)
    FileNotFoundError: [Errno 2] No such file or directory: 'bin/root-config'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/fadesse/pip-install-kpvpn5p6/root-numpy/setup.py", line 65, in <module>
        rootsys, root_config))
    RuntimeError: ROOTSYS is  but running bin/root-config failed
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/fadesse/pip-install-kpvpn5p6/root-numpy/

I think its related to root_numpy and ROOT.

I previously updated pip and setuptools (just in case). Not updating them gives the same result.

I use the default lxplus7 settings with python3 (in a venv, but I don't think this makes any difference):

(test-venv2) [fadesse@lxplus029 ~]$ which python
~/test-venv2/bin/python
(test-venv2) [fadesse@lxplus029 ~]$ python --version
Python 3.4.9
(test-venv2) [fadesse@lxplus029 ~]$ which root
/usr/bin/root
(test-venv2) [fadesse@lxplus029 ~]$ pip list
Package    Version
---------- -------
pip        18.0   
setuptools 40.4.3

Thx for your help ! :)

Change the default tree name from default to something else

The trouble with default as a tree name is that the ROOT interpreter reads it as the C++ default key word and throws an error.

Change error message in to_root when two columns have the same name

Current error message is:
ValueError: name already used as a name or title

Handle array access sensibly

Per-entry arrays are used quite heavily in ROOT files.
It would be nice if access to an array index in the columns argument would automatically lead to that array element being read out for each entry in the tree:

df = read_root('file.root', columns=['myarray[0]'])

The column should probably have a different name, like myarray_0 to avoid problems when writing it to disk.

Accept lists of files

root_numpy.root2array and friends can accept lists of files. If would be handy if root_pandas could support them as well.

I'm working on a PR, and I have a couple of thoughts:

The single file argument is used to retrieve the list of trees, and then then list of branches in the chosen tree. What should be done when multiple files are specified? I think we should use the first file in the list, and throw an exception if later files don't match the format of the first (should we explicitly throw an exception, or just wait for root_numpy to fail when it can't access the tree/branch?)
If a chunksize is specified, a generator is returned that steps through the single file in strides of chunksize rows. With multiple files, we need to step through files. I don't think this will be a problem, as root_numpy handles list of files as a TChain, so it should be handled transparently, but this logic will need tweaking.