datreant / datreant Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 7.0 15.01 MB

persistent, pythonic trees for heterogeneous data

Home Page: http://datreant.org

License: BSD 3-Clause "New" or "Revised" License

Python 98.50% Batchfile 0.58% Shell 0.91%

database filesystem

datreant's Introduction

datreant: persistent, pythonic trees for heterogeneous data

In many fields of science, especially those analyzing experimental or simulation data, there is often an existing ecosystem of specialized tools and file formats which new tools must work around, for better or worse. Furthermore, centralized database solutions may be suboptimal for data storage for a number of reasons, including insufficient hardware infrastructure, variety and heterogeneity of raw data, the need for data portability, etc. This is particularly the case for fields centered around simulation: simulation systems can vary widely in size, composition, rules, paramaters, and starting conditions. And with increases in computational power, it is often necessary to store intermediate results obtained from large amounts of simulation data so it can be accessed and explored interactively.

These problems make data management difficult, and serve as a barrier to answering scientific questions. To make things easier, datreant is a Python package that addresses the tedious and time-consuming logistics of intermediate data storage and retrieval. It solves a boring problem, so we can focus on interesting ones.

For more information on what datreant is and what it does, check out the official documentation.

Getting datreant

See the installation instructions for installation details. The package itself is pure Python.

If you want to work on the code, either for yourself or to contribute back to the project, clone the repository to your local machine with:

git clone https://github.com/datreant/datreant.git

Contributing

This project is still under heavy development, and there are certainly rough edges and bugs. Issues and pull requests welcome!

Check out our contributor's guide to learn how to get started with contributing back.

datreant's People

Contributors

Stargazers

Watchers

Forkers

andreabedini sseyler kain88-de jdetle gabrielelanaro mimischi guyer

datreant's Issues

provide conda packages

Since MDAnalysis has joined the conda hotness I thought datreant could do the some. I got packages for all dependencies already except for scandir.

https://anaconda.org/kain88-de/packages

Need mechanism for updating file schema from previous versions

Closely related to datreant/MDSynthesis#36

Upon loading an existing Treant, the corresponding TreantFile class should do a version check between the version given by the state file and the current version of datreant. It should then run code that updates the schema to that used by the current version of datreant.

This will require:

an explicitly-documented schema spec for each TreantFile class, which can change each release
a mechanism for performing iterative updates to existing files; in other words, a file that was made with a very old version of datreant gets schema updates for each version of datreant it is behind, until it reaches the current one
the mechanism to update files needs to be stress-tested with its own set of unit tests; it should be robust enough to restart conversion of a file that is only half-converted to a new version, which might happen if the python session dies mid-conversion.

Add `walk` generator to Tree.

It would be a nice convenience to walk a Tree precisely like scandir.walk like:

import datreant.core as dtr

for root, dirs, files in dtr.Tree('mytree').walk():
    # do stuff

This should be pretty straightforward.

Need mechanisms for copying filesystem objects

We need convenient mechanisms for copying filesystem objects (Treants, Trees, Leaves) to other locations in the filesystem. Careful attention needs to be given to destructive cases to make sure these are as safe to use as possible.

Switch to numpy-style docs.

Because this library is targeted toward scientific users, and interfaces with scientific packages, we should use the accepted numpy standard for docstring style.

datreant.core.discover() shouldn't look through whole tree when depth specified

Currently, we use scandir.walk() to walk directories with datreant.core.discover. This is just a re-implementation of the standard os.walk with scandir.scandir as a fast way to deliver the contents of the directory. Because scandir.walk does not allow specifying a depth of any kind, we limit the depth externally but the iteration of the full tree goes on regardless.

This is a poor implementation, so we should do something to keep the walk from progressing beyond the depth. Some options from @benhoyt at benhoyt/scandir#20:

The most straight-forward way to get what you want is to reimplement the subset of walk() that you need using scandir, and add a depth parameter to limit the recursion depth.

You could also use the existing scandir.walk() an modify the dirnames list in place -- i.e., clear out the dir names when the root is more than N levels deep. See more in the os.walk() docs here: https://docs.python.org/2/library/os.html#os.walk

The second one is probably the easiest to slot in right now quickly, but the first one might be beneficial long-term.

Need stress tests for file locking / concurrency

Despite the advantages of structuring Containers with their stored data at the filesystem level instead of shoving them into a single database, the major disadvantage is that we have to be careful that file locks are working to ensure against concurrency corruption. We need stress tests of the file locking mechanisms for:

ContainerFile and its child classes
DataFile variants

We also need to test that changes in filesystem elements that occur through Container properties (e.g. #5) are also handled gracefully, though this will almost certainly require the underlying ContainerFile object to use the Foxhound to fetch its new path.

Make all set operations work for collections

Collections (Bundles and Group.members) behave as ordered sets in that only one instance of a treant is present as a member (no redundancy). It would be useful if all the operators that work for sets actually worked for collections. The full list should be implemented.

treant names cannot contain dots '.'

If I use a name like A.B.C.D for a treant then sooner or later I get an

KeyError: B

I should be able to use pretty much any character (maybe except '/' or perhaps ':' (whatever really os.sep is)).

Groups and Bundles (and later Coordinator) need a mechanism for loading user-defined Treants

At the moment, Groups and Bundles do not have a way of knowing what class to instantiate an object from when loading a user-defined Treant subclass. We need an optimally simple registering mechanism to make this work without fuss.

Need clear definition for when Treants use logging vs. when they throw exceptions

At the moment each Treant and Group gets a logging instance corresponding to Treant.<name> or Group.<name>, respectively, and these are used to output information to the user. So far there are no written rules as to when the logger should be used, as well as when an exception should be raised instead.

Change package to `datreant.core`?

My broader vision for datreant is for it to provide core objects (Treants) for pythonic manipulation of directory trees containing many files, along with objects that help with working on aggregations of these objects (Bundle). Other modules, such as datreant.data provide extended functionality to these objects upon import, but may require specialized dependencies. Meanwhile, domain-specific packages like MDSynthesis come batteries-included, with all the datreant submodules needed to provide the user experience they seek to give, plus whatever else.

So, in order for this to technically work, there cannot be a datreant module per-se. Instead, datreant must be made to be a namespace-module, which on import cannot really contain anything. What is currently datreant must be moved to a sub-module, such as datreant.core.

Before making this move, I think some debate is warranted. Thoughts? If no one end up minding, I'll go ahead and make it happen.

Add simple query for data elements in a Treant.

Although the keys for all data elements currently display by default using, e.g. Treant.data, it would be useful to be able to get a listing of data keys that match a query. This could be as simple as making some kind of Data.isin method that takes a string as input and outputs all keys that have that string present.

Add an aggregator for dataset metadata

Although HDF5 files are able to store attributes to internal nodes, these are known to be slow to read/write, and there are fundamental size limits to these. What's more, we hope to support more data file backends in the future, so we don't want to rely on the neat features of HDF5 for metadata storage.

If treants had a .meta aggregator that worked similarly to the .data aggregator, but instead storing dictionaries of python data structures as flat yaml files in the directory structure of the treant, that would be swell. This would allow metadata to live alongside data, whatever its form, and it would be human-readable and editable.

Drop pytables backend in favor of json?

I recently finished work on JSON backends for Treant and Group state files; the current status can be checked out on the exp-serial_backends branch. This was a follow-up to the work I had done previously but never quite finished on the YAML backend (this also works).

What I didn't expect was better performance on the part of JSON over the HDF5 format (see this gist for some quick comparisons), although in hindsight it makes sense given that for storing and retrieving a few strings deserializing JSON is less costly than dealing with the HDF5 library, which is better for long reads of large datasets. It gave me a thought:

Should we drop pytables as a backend in favor of JSON?

I'm leaning in favor of this conclusion, for several reasons:

Since state files only store metadata consisting of tags and categories (at the moment), and since these probably number < 100 for a given Treant, performance is better than HDF5.
State files are human-readable with any text editor, and also grepable. :D
JSON is a serialization format, so streaming a state file through a pipe is trivial. Anything that deals in JSON can deal in state files.
No HDF5 or PyTables dependency.
JSON is more flexible in some ways. Instead of just tags and categories being stored as strings, they could also be floats, ints, booleans.
Unicode string storage, which does not work in HDF5 under PyTables, is no longer an issue. This was also the only barrier to making datreant work fully under Python 3.
No need for fixed-length strings, which is currently not handled elegantly in the PyTables backend.

Some downsides:

For derived classes in other packages such as mdsynthesis.Sim, storage of big things like arrays of resnums gets tricky and could degrade performance quickly (though with the new Topology object, Universe persistence may change greatly).

This move would remove a fundamental dependence on HDF5 and PyTables, which could allow splitting off datreant.data as a separate repo that upon imports gives the Treant.data limb for convenient storage. That's a separate issue, though, and merits its own discussion.

Fancy indexing should work for Group.members and Bundle

It should be possible to get back a Bundle by using numpy-style fancy indexes on Group.members and Bundle, which could be lists of indices to return or boolean indexes. This currently does not work.

This issue was originally brought up in #25.

Add globbing syntax to Bundle.

Bundle is intended to be a useful grouping tool for Treants without the persistence of a Group. It would be awesome if it could run any strings it receives as input through glob.glob to grab whole sets of Treants from the filesystem easily.

pass methods and attributes through containers

I find writing

G = Group(pathname)
names = [x.name for x in G.members]

tedious. Furthermore, if G contains other groups, it sort of breaks (or I would have to manually and recursively flatten the group).

Instead I'd like some magic to happen:

names = G.names
status = G.categories['status']
bool_finished = (G.categories['status'] == "complete")
bool_protein = G.tags["protein"]

where lists (or np arrays) are produced with the appropriate content from the members.

Essentially, override __getattr__() and friends to pass all unknown methods and attributes down to the lower level and apply them if possible. Do this recursively. Possibly return nested lists.

Not sure how to deal with a group that includes itself and to break infinite recursion. Possibly forbid it in the first place (would also please Zermelo).

Admittedly a bit dirty because there are the objects own methods that you cannot shadow... so feel free to close this issue with wontfix if you want to keep things tidy ;-)

Possibly related to #23.

Remove Group in favor of Bundle only?

This issue was first broached in #35 here.

The Group object is a Treant that can store information on other Treants, giving access to them as members. It was the first aggregation scheme built for datreant/mdsynthesis, predating the in-memory, non-persistent Bundle object that works similarly. There are fundamental technical problems with making Groups work reliably.

Because they store paths to their members persistently, this information can quickly go out of date. The Group will go looking for members when they go missing, but this machinery gives no guarantee it can find them and old Groups can tend to be broken.

I propose we ditch this object in favor of Bundle usage, which can be persisted with a Bundle.dump method or similar, and reloaded later. I think we should move away from storing file paths and the like within state files since these tend to go stale more quickly than one might think.

I've been experimenting with this idea in the nogroups branch here, which builds on the ideas of #36 and begins work toward #23 and #30.

What do you think? What do you currently use Groups for, and would Bundle fill this need?

Create organization website, set up Travis to push development doc builds

Read the Docs doesn't appear to play well with setup.py install_requires, which we want to keep so that dependencies get automatically handled with setuptools or pip. Even though we can mock our imports, keeping install_requires will pull the dependencies anyway, which fails the build.

Instead, it makes sense to create a website for the organization at http://datreant.org, the domain name for which I already own. We'll probably go the same route as http://mdanalysis.org with a simple Jekyll site, probably with a different color theme, and then set up travis to push development docs somewhere in the tree.

Build in mechanism to state files to reclaim space

HDF5 doesn't (yet) reclaim space from deleted nodes. Therefore, in principle TreantFiles will slowly grow if enough universe definitions / selections are added/replaced. We need to assess how big a problem this is, and what the best solution is for "cleaning" state files that have this potential for bloat.

Experiment: Remove statefile backends; make statefile contents persistent property

This is meant as a discussion point for an experimental idea that may not work, and also a brain dump before I forget.

Currently every Treant has a TreantFile object that serves as its interface to its persistent state file on disk, which is in JSON and maps directly to a Python object. The TreantFile features methods like add_tags that do all the detailed work needed to store information in the file (including file locking/serialization) so that friendlier user-facing interfaces to these components can exist at the Treant level, such as Treant.tags.add, and these don't have to handle details like atomicity of changes. This also separates underlying implementation from user interface.

This scheme was necessary when storage of tags, functionally a set, needed to be turned into tables stored in an HDF5 file. But since the underlying serialization format is now JSON, and since this maps directly into a Python structure already, this interface layer is perhaps no longer necessary.

The stored data could be made available as a Treant._state property, which can be accessed and modified with context managers as:

import datreant as dtr

t = dtr.Treant('sprout')

# applies shared lock for reading, deserializes into t._state
with t._read():
   tags =  t._state['tags']

# applies exclusive lock for writing, deserializes into t._state, 
# then reserializes when finished
with t._write():
    # tags are supposed to be non-repeated, no sense of order,
    #  but JSON needs a list
   t._state['tags'] = list(set(t._state['tags'].append('stickly')))

The user-facing API is still Limb-based, but the above access pattern would be how the internal methods of Limbs could look. That is, they directly modify the state data structure instead of doing it through a TreantFile's methods.

This makes it possible to write custom Limbs that can modify the state file without needing to create a new Treant type with its own TreantFile backend subclass. It also means that all the details for how, say, the Tags Limb stores its data is in its implementation and not somewhere else.

Disadvantage: Limbs have to be written with more care, since their methods need to keep the statefile in a consistent state. Since there would be no "gatekeeper" TreantFile class, any change could be made to the data structure. A poorly written method in a Limb could wipe out the entire contents of the state file.

Make name and location setters get exclusive lock

Since it's possible for name and location properties to change the path to a Treant's statefile, these should obtain an exclusive lock on the statefile before being applied. This does not happen at the moment. Also, other instances of the Treant in other python sessions will not find the location of their state files. The Foxhound object should be used by the Treant's File backend to sniff it out when/if this happens.

Tags should respond to set operators

Since the Tags limb functions as a set, it should behave as one. Comparing the tags of two Treants with something like:

from datreant.core import Treant

t1 = Treant('sprout')
t2 = Treant('bark')

t1.tags.add('elm', 'thorn')
t2.tags.add('elm', 'fork')

# should give a set giving the union of both tags
a_lovely_union = t1.tags | t2.tags

See the behavior of sets to get the full set of operations that need defining. Can also look to Bundle for an example for how to define set-like behavior.

Remove location keyword for Treant creation; use path instead + new keyword for forcing new

Related to MDSynthesis #29. Currently how the name string is parsed when creating a new Treant is fundamentally different than for when a Treant is being re-generated. We can unify these by treating it as a path at all times, creating a new Treant if one is not found at the given location, loading an existing one if the path contains a unique Treant (only one state file), or forcing the creation of a new one with the given path using a new=True keyword.

Add a `flatten` method to collections

Bundle and Group.members should provide a flatten method that returns a Bundle giving all Treants of within the collection that are not themselves Groups, and all Treants within any Groups in the collection recursively. It should take a keyword argument depth that allows one to limit to what recursion depth the method goes in giving its result.

Support Python 3.4.

It's important that we support Python 3.4. The only barrier I see at the moment is related to unicode handling on the part of Pytables. For example, we get with Python 3.4:

import datreant as dtr
t = dtr.Treant('treebeard')
t.tags.add('bark')
'bark' in t.tags()
>>> False

because bark is in unicode, but the 'bark' read from the file that's in the set of tags is of type byte, which is a Python 2.7 style string. Opening the same Treant in Python 2.7 yields True for 'bark' in t.tags().

This is related to PyTables/PyTables#268. Until a cleaner solution appears there, we could wrap writes and reads to files to convert inputs and outputs of strings, but this might be a performance hit at this level.

Use pylint to de-lint codebase

I ran py.test --pylint on datreant today, and I think there's a large number of elements it pointed out that could be improved. However, it's a bit of a task, and so we should probably figure out which recommendations we wish to follow and which to ignore.

Need mechanisms for deleting (unlinking) filesystem objects.

We need convenient mechanisms for deleting filesystem objects (Treants, Trees, Leaves), but these must be carefully thought out so that they are minimally destructive by default, and not too easy to accidentally do. Opinions wanted.

Need mechanisms for moving filesystem objects

Corollary to #54.

We need convenient mechanisms for moving filesystem objects (Treants, Trees, Leaves) to other locations in the filesystem. For Treants in particular this may be tricky to do well, since we should consider file locking and how another instance of the same Treant might find its moved statefile (using Foxhound).

Read-only files give exception on Treant regeneration

Accessing a read-only Treant in another user's directory, for example, currently does not work. This is because an existing proxy file cannot be opened for writing, and the OSError exception is not caught. Also, recent changes to the TreantFile constructor may yield an exception due to read-only access as well.

Read-only access to a Treant should be made to work again.

Add Tags.regex and Tags.glob

We want to be able to match tags with regexes or globs. Should add two new methods Tags.regex and Tags.glob in the same vein as Tags.fuzzy for doing this. Should return matching tags as a tuple. Must have analogous methods for AggTags, too. Again, see AggTags.fuzzy.

Create View object

A Bundle is an ordered set of Treants (and their subclasses), offering convenient mechanisms for working with them in aggregate. A similar, but more general object should exist for manipulating Trees and possibly Leafs. Not sure what this should look like yet, however.

set compression for hdf5 data storage

I'm not sure if this exists yet but it would be nice if I could change the compression algorithm used by pytables to store my data.

Crash when adding numpy array in data

I have cloned datreant yesterday in order to give it a try (I'm up-to-date with this repository). The following script exits with a segfault:

import datreant
import numpy as np
sprout = datreant.Treant('sprout')
print "Adding list"
sprout.data.add('test', [1, 2, 3])
print "Adding np.array"
sprout.data.add('test2', np.array([1, 2, 3]))

Adding the list works but adding the np.array fails in H5Open. If I re-run the script without removing the sprout directory, it works.
I use Ubuntu 14.04. The current version of the HDF5 library is 1.8.11.

Is it possible to copy part of a Treant?

On the cluster I may want to add several simulations and all analysis that take a long time to run in a datreant. But I don't necessarily want to have the complete datreant on my workstation all the time (because of space reason or other things).

Is it possible to only copy part of a datreant? Extra points if I only need to call rsync on a subfolder.

Group.members should also allow indexing by member name

Currently, Group.members[:] yields a list of all members, and slicing by index is allowed to get lists of a subset since members do have an order. However, it would be most convenient to be able select members by name, such as with:

 Group.members['lark']

or::

 Group.members[['lark', 'hark']]

as is used by pandas DataFrames to select multiple columns. This should then yield a Bundle object instead of a list containing all members that matched the names given. Since names are not required to be unique, this could be many more than the number of names supplied. This would allow calls such as:

  Group.members[['lark', 'hark']].data['ionbinding']

Which would retrieve a concatenation of all concatenatable datasets present in the Bundle matching the name 'ionbinding' once Issue #8 is addressed.

Complete test coverage for AggTags

datreant.core.agglimbs.AggTags currently has placeholders for tests in datreant.core.tests.test_collections.TestBundle, but these need to be filled in to actually do anything. Help wanted.

Add different tags to different data entries in a Treant

I want to collect observables from several hundred observations into one Treant object. In the simulations different variables change. It would be nice if I could tag the data in a Treant instead of just adding tags to the Treant. I could use the tags to query for all simulations that fulfill a certain condition (eq integration time step, protein, ...)

I'm aware that I could achieve the same thing already with groups. But I only want to use one Treant object because I would like to have just one folder that I need to copy from the remote cluster to my laptop. If I understand groups correctly right now they just link to the Treant objects contained in them, which means I would still need to copy a lot of folders and have complicated rsync-rules to only copy the data I want.

So another solution for me would be that I can move Treant objects into the same location on disk as the Group is.

Update docs in preparation for 0.6.0

Many new features and changes have been made since the docs were originally written (which was for MDSynthesis back in January). These need to be updated with fresh eyes in light of these changes, and then vetted by others.

I'll be working on this over the weekend. Anyone interested in helping to make them better after a first draft? @cing, @sseyler? I'll ping this issue when I've finished an alpha version.

Data interfaces: does it make sense to use Blaze as the backend?

Currently our backends for storing numpy arrays, pandas objects, and anything else are h5py, pandas.HDFStore, and cPickle, respectively. These work well enough for the moment, but I wonder what could be gained by using the blaze.Data as the backend instead?

I still haven't collected my thoughts on this, but doing something like:

import datreant as dtr
import pandas as pd

# toy data
df = pd.DataFrame(pd.np.random.randn(1000, 3), columns=['a', 'b', 'c'])

# with new or existing treant, store the dataframe using default backend (pd.HDFStore)
t = dtr.Treant('sprout')
t.data['random dataframe'] = df

# get data back
d = t.data['random dataframe']   # d is a blaze.Data object

(df == d.data).all().all()
>>> True

Possible advantages:

Using blaze.Data objects might make it easier to build more backends, in that they are already part of blaze. Any backend will also give the same interface.
These objects can be used directly with dask for complex, scheduled out-of-core operations.

Possible and immediate disadvantages:

Getting data using __getitem__ syntax above yields a different object than went in using __setitem__. This is annoying at best.
The blaze API is far from stable, and it has a lot of rough edges just from playing with it right now. It will be a moving target for some time.
Not all objects that the current scheme stores can be read by blaze.Data. For example, pandas.Panel objects stored in pandas.HDFStores. Though the blaze.Data interface would be common for many backends, it wouldn't work for all types of data. Not for "arbitrary" objects that we pickle, either.
Importing blaze is very slow at the moment.

I think there is something worth thinking about here, so this issue should serve as a discussion point. Did I miss anything? Is there a better way that one might use blaze as a/the data handling backend in datreant?

Add a `describe` method to treants that displays each aggregator output together

It would be nice to have one method that gives some summary information on the components of a treant. It could scrape up the print representations of each Limb and perhaps abbreviate them in some way. It may require some thought to make this mechanism work usefully for new Limb patched in, such as selections used in MDSynthesis.Sim, for example.

Add contributor's guide to main docs

We need a contributor's guide on the docs, and to fix the broken links that point to a nonexistent wiki link to point to it.

Allow globbing syntax for Data.remove().

Since datasets are stored in a directory structure, and since their names reflect this, it would be fairly easy to make deletions using globbing. This would be a great convenience when some datasets matching a pattern should be removed without removing others.

A static view that can be filtered/sorted should exists

While I was looking for a solution to my problem in #23 I noticed that I actually want something
even more powerful. Just having a convenient access to a leave in the treant is not enough. I also want to be able to select special subsets of the leaves. An additional view class that can select specific subsets in a leave would solve that. I have made a gist with an example implementation of a TreantView and how I use it.

copy treants on remote machines

Hi, as a HPC user, I often find myself generating input directories (like for gromacs) and then uploading and running on a HPC cluster.

Is this use case covered? (or are there any plans for that use-case?)

Create AggLimb `AggCategories`

We want an aggregation limb (AggLimb) for Categories similar to that which exists for Tags. It should give useful ways for dealing with the categories stored among the Treants of a Bundle. I propose that it have at least the following:

the same API methods as Categories: add, remove, clear, keys.
a groupby method that gives groupings of Treants based on category values given keys.
a __getitem__ that takes:
1. a string giving a key returns a list of values in member order for that key; None is given for members in which they key is not present.
2. a list of keys gives a list of lists given in order by key, each giving the values in member order; None as a value means they key is not present for the member (as above)
3. a set of keys gives a dict of lists, with keys as keys and a list of values as above.
anything else I've failed to think of.

Add `filter` convenience method to AggTags

It's possible to filter Treants from a Bundle directly from their tags with:

import datreant.core as dtr

b = dtr.discover('.')
b[b.tags['fluffy']]

where we used the resulting boolean index returned from calling b.tags['fluffy'], where 'fluffy' was the given tag expression. It would be nice to achieve the same effect with:

b.tags.filter('fluffy')

for the terminally lazy. AggTags.filter would take a tags expression, and could even include useful kwargs such as fuzzy for applying fuzzy matching to each tag given.

access data objects by 'subgroup'

I would like to be able to access data objects also with subgroups. Assuming 'groups' are separated by a '/'. So I would create the Treant like this.

import datreant as dtr

treant = dtr.Treant('observables')

treant.data.add('long_sim/obs_1', obs_1)
treant.data.add('long_sim/obs_2', obs_2)

treant.data.add('short_sims/1/obs_1', short_1_obs_1)
treant.data.add('short_sims/1/obs_2', short_1_obs_2)
treant.data.add('short_sims/1/obs_3', short_1_obs_3)

treant.data.add('short_sims/2/obs_1', short_2_obs_1)
treant.data.add('short_sims/2/obs_2', short_2_obs_2)
treant.data.add('short_sims/2/obs_3', short_2_obs_3)

Now it would be great If I could do something like this

>>> long_sim = treant['long_sim']
>>> print(long_sim.data)
Data
====
'obs_1'
'obs_2'

I haven't found anything in the docs about this. Is there a possible query for this? Can groups be used for something like this?

Make addition of Treants yield a Bundle

Bundles function somewhat as throwaway Groups, giving built-in methods for dealing with whole collections of Treants. It would be particularly pythonic if addition between Treants creates a Bundle, and addition between a Bundle and a Treant makes a new Bundle with that Treant as a member.