datreant / mdsynthesis Goto Github PK
View Code? Open in Web Editor NEWa logistics and persistence engine for the analysis of molecular dynamics trajectories
Home Page: http://mdsynthesis.readthedocs.org
License: GNU General Public License v2.0
a logistics and persistence engine for the analysis of molecular dynamics trajectories
Home Page: http://mdsynthesis.readthedocs.org
License: GNU General Public License v2.0
Sim.selections.keys() # works
Sim.categories.keys() # works
# but
Sim.data.keys() # doesn't work
Sim.universes.keys() # doesn't work
If I'm using a dict like object, it makes sense to have a keys
method for this. I had a quick look through the code, and it looks like the data
aggregator is different, but ideally it should behave like the others? Maybe link keys
to data._list
?
In [1]: import mdsynthesis as mds
In [2]: S = mds.Sim('cg')
In [5]: S.universes.add('main', ['topol.tpr','cg.xtc'])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-f7469218f877> in <module>()
----> 1 S.universes.add('main', ['topol.tpr','cg.xtc'])
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/aggregators.py in add(self, handle, topology, *trajectory)
333 outtraj.append(traj)
334
--> 335 self._backend.add_universe(handle, topology, *outtraj)
336
337 if not self.default():
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in inner(self, *args, **kwargs)
224 self._exlock(self.handle)
225 try:
--> 226 out = func(self, *args, **kwargs)
227 finally:
228 self.handle.close()
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in add_universe(self, universe, topology, *trajectory)
880
881 # add topology paths to table
--> 882 table.row['abspath'] = os.path.abspath(topology)
883 table.row['relCont'] = os.path.relpath(topology, self.get_location())
884 table.row.append()
/usr/lib/python2.7/posixpath.pyc in abspath(path)
365 def abspath(path):
366 """Return an absolute path."""
--> 367 if not isabs(path):
368 if isinstance(path, _unicode):
369 cwd = os.getcwdu()
/usr/lib/python2.7/posixpath.pyc in isabs(s)
59 def isabs(s):
60 """Test whether a path is absolute"""
---> 61 return s.startswith('/')
62
63
AttributeError: 'list' object has no attribute 'startswith'
In [6]: S.universes.add('main', 'topol.tpr','cg.xtc')
---------------------------------------------------------------------------
NoSuchNodeError Traceback (most recent call last)
<ipython-input-6-f176da1cef98> in <module>()
----> 1 S.universes.add('main', 'topol.tpr','cg.xtc')
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/aggregators.py in add(self, handle, topology, *trajectory)
333 outtraj.append(traj)
334
--> 335 self._backend.add_universe(handle, topology, *outtraj)
336
337 if not self.default():
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in inner(self, *args, **kwargs)
224 self._exlock(self.handle)
225 try:
--> 226 out = func(self, *args, **kwargs)
227 finally:
228 self.handle.close()
/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in add_universe(self, universe, topology, *trajectory)
872 '/universes/{}'.format(universe), 'topology')
873 self.handle.remove_node(
--> 874 '/universes/{}'.format(universe), 'trajectory')
875
876 # construct topology table
/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in remove_node(self, where, name, recursive)
1779 """
1780
-> 1781 obj = self.get_node(where, name=name)
1782 obj._f_remove(recursive)
1783
/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in get_node(self, where, name, classname)
1614 # Now we have the definitive node path, let us try to get the node.
1615 if node is None:
-> 1616 node = self._get_node(nodepath)
1617
1618 # Finally, check whether the desired node is an instance
/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in _get_node(self, nodepath)
1553 return self.root
1554
-> 1555 node = self._node_manager.get_node(nodepath)
1556 assert node is not None, "unable to instantiate node ``%s``" % nodepath
1557
/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in get_node(self, key)
434
435 if self.node_factory:
--> 436 node = self.node_factory(key)
437 self.cache_node(node, key)
438
/home/richard/.local/lib/python2.7/site-packages/tables/group.pyc in _g_load_child(self, childname)
1184 childname = join_path(self._v_file.root_uep, childname)
1185 # Is the node a group or a leaf?
-> 1186 node_type = self._g_check_has_child(childname)
1187
1188 # Nodes that HDF5 report as H5G_UNKNOWN
/home/richard/.local/lib/python2.7/site-packages/tables/group.pyc in _g_check_has_child(self, name)
400 raise NoSuchNodeError(
401 "group ``%s`` does not have a child named ``%s``"
--> 402 % (self._v_pathname, name))
403 return node_type
404
NoSuchNodeError: group ``/`` does not have a child named ``/universes/main/trajectory``
Group.members
and Bundle
are intended to make it easy to manipulate many Containers at once, but currently they only give access to the objects themselves. It would be useful to include methods that yield aggregate information from these collections. Both objects would have these methods in common.
For example, could have
Bundle.data
, which gives access to concatenations of stored pandas data sets. It grabs any datasets it can that match the handle given, and tries to concatenate them. Would be useful for quickly aggregating and manipulating ensemble data.
Bundle.tags
, which gives all tags present in the collection. Could have keywords for any
and all
criterion for what to return.
The problem is pretty straight forward
import mdsynthesis as mds
s=mds.Sim("marlar")
s.data['poop']=25
s.data['poop']
25
s.data['poop']=50
s.data['poop']
25
s.data.add('poop', 50)
s.data['poop']
25
s.data['poop2']=[1,2]
s.data['poop2']
[1, 2]
s.data['poop2']=[1,2,3]
s.data['poop2']
[1, 2]
I dug a little into this one and the cause of the issue seems to be the file mode ab+, it straight up doesn't work for pickle dumping (see code on mdsynthesis/core/persistence.py:1962 where you use that mode). I can submit a one-line PR that changes the mode to "wb+" if there's no reason not to.
Here's the MDSynthesis-free root of the problem:
import pickle
pickle.dump(20, open("marlar.pkl", "ab+"))
pickle.load(open("marlar.pkl","rb"))
20
pickle.dump(30, open("marlar.pkl", "ab+"))
pickle.load(open("marlar.pkl","rb"))
20
pickle.dump(30, open("marlar.pkl", "wb"))
pickle.load(open("marlar.pkl","rb"))
30
When multiple trajectories are needed for a universe definition, it may be useful to allow globbing syntax for the paths stored so that new trajectory files that match the pattern are picked up by the universe. Then again, this feature could be troublesome. How feasible is it to include, and what are some pitfalls?
The new MDAnalysis release 0.11.0 breaks parts of the API and it is possible that MDSynthesis needs to be migrated and pinned to MDAnalysis >= 0.11.0.
Despite the advantages of structuring Containers with their stored data at the filesystem level instead of shoving them into a single database, the major disadvantage is that we have to be careful that file locks are working to ensure against concurrency corruption. We need stress tests of the file locking mechanisms for:
We also need to test that changes in filesystem elements that occur through Container properties (e.g. #19) are also handled gracefully, though this will almost certainly require the underlying ContainerFile object to use the Foxhound to fetch its new path.
The online docs do not contain the doc strings of the individual classes and functions. There should be a module/class/function reference section.
Pytest is our testing framework of choice, and some basic tests have been written for Container elements (tags and categories), but these need to be expanded to cover all components of Sims and Groups. This includes:
Additional, less atomic tests can be added later. These may include, e.g., organizational patterns:
In [39]: S = mds.Sim('adk')
In [40]: u = mda.Universe('adk.psf',['adk_dims.dcd', 'adk_dims.dcd'])
In [41]: S.universe = u
In [42]: S.udef.trajectory
Out[42]: u'/home/richard/test/mdsynthesis/adk_dims.dcd'
I'm not exactly sure what's going on here, but I can't store a 1D DataFrame. Retrieving a (10,1) DataFrame gives the error "TypeError: Index(...) must be called with a collection of some kind, None was passed":
import mdsynthesis as mds
import numpy as np
import pandas as pd
s = mds.Sim('marklar')
s.data.add('test1',pd.DataFrame(np.zeros((1,10))))
print s.data["test1"] #Good
s.data.add('test2',pd.Series(np.zeros((10,))))
print s.data["test2"] #Good
s.data.add('test3',pd.DataFrame(np.zeros((10,10))))
print s.data["test3"] #Good
s.data.add('test4',pd.DataFrame(np.zeros((10,1))))
print s.data["test4"] #Not Good
I'm pretty sure this is actually a problem with writing the file to disk because the h5 file seems weird:
z=pd.HDFStore("test4/pdData.h5", 'r')
>>> z
<class 'pandas.io.pytables.HDFStore'>
File path: pdData.h5
/main [invalid_HDFStore node: sequence item 0: expected string, numpy.int64 found]
Any idea how to fix this?
This would be useful when loading a trajectory may be rather slow, especially if it is many XTC trajectories since these will be indexed by MDAnalysis immediately on load. Also, being able to load a subset of the trajectories may be useful for the same reason until we get persistent indexes.
Trying to add a category to a Sim
with
s.categories.add('temperature', 303)
will raise no exception, but also won't add the intended category.
Currently, Group.members[:]
yields a list of all members, and slicing by index is allowed to get lists of a subset since members do have an order. However, it would be most convenient to be able select members by name, such as with::
Group.members['lark']
or::
Group.members[['lark', 'hark']]
as is used by pandas DataFrames to select multiple columns. This should then yield a Bundle
object instead of a list containing all members that matched the names given. Since names are not required to be unique, this could be many more than the number of names supplied. This would allow calls such as::
Group.members[['lark', 'hark']].data['ionbinding']
Which would retrieve a concatenation of all concatenatable datasets present in the Bundle
matching the name 'ionbinding' once Issue #8 is addressed.
At the moment each Sim and Group gets a logging instance corresponding to Sim.<name>
or Group.<name>
, respectively, and these are used to output information to the user. So far there are no written rules as to when the logger should be used, as well as when an exception should be raised instead.
It would be convenient (and more object oriented) if I could say
u = MDAnalysis.Universe(TPR, XTC)
s = Sim(name)
s.universes.add('anyname', u)
instead of s.universes.add('anyname', universe=[TPR, XTC, ...])
.
Although I appreciate that this will make it more difficult to recreate the universe. Perhaps should be tackled together with MDAnalysis/mdanalysis#173 , which could supply the necessary state information to reconstitute the universe.
I haven't found a way to rename a universe --- am I missing something? I was looking for
Sim.universes.rename(old, new)
or at least
Sim.universes[new] = Sim.universes[old]
del Sim.universes[old]
Since the basic structure of MDSynthesis isn't entirely specific to molecular dynamics simulation, I think it might make sense to break the core out into a separate package. I'd been considering this for a while, but today I was contacted by someone with a use-case outside of MD, and I think it would be of benefit to MDSynthesis' core development to make it more general.
I've started a repository for this work here: https://github.com/dotsdl/datreant
Is anyone opposed to this? One thing to to note is that datreant
is BSD 3-clause licensed, making its use more permissive than MDSynthesis, which is GPLv2 (same as MDAnalysis). I technically need permission from everyone that has contributed code so far to make this change.
P.S. Opinions on the name are welcome. I wanted something that included 'dat' to indicate 'data', and some kind of word to indicate 'trees' (as in directory trees). An old D&D woodland creature ('treant') came to mind. :D
It may be useful for others in a lab group on a shared volume to be able to use Sims and Groups of others with read permissions but without write permissions. If Sims and Groups can generally be made to work with only read permissions for accessing their stored attributes/data, this would make this possible.
The main changes would come in the __init__
methods of each container.
I like using selections (and everything else) in a pythonic fashion (i.e. if it looks like a dict it should mostly behave like one even if under the hood it's all HDF5). One strength of MDS is hiding all the bookkeeping.
Therefore, it is annoying if a non-existent, say, selection raises NoSuchNodeError (no idea what kind of exception this is) when I tried
try:
sel = self.sim.selections[name]
except KeyError:
# do something about it because selection 'name' is not stored in the sim
because from the syntax I expected to get a KeyError.
We need convenient mechanisms modifying universe definitions. One way to do this would be to have Universes.topology
and Universes.trajectory
properties that allow getting and setting of the corresponding elements. If it isn't too confusing, could include __setitem__
and __getitem__
to the underlying objects that allow setting and getting for any universe definition.
The docs still state that this is alpha software but I think in truth it has been in stable use for more than a year. Isn't it time to slap a 1.0 on it?
(Or are we waiting for MDAnalysis 0.16.0?)
What is holding up a release?
At the moment, Universes
and Members
use only stored absolute paths for finding the files they need to generate universes and members, respectively. Relative paths from the Container's basedir are also stored, but they are not yet used. This breaks functionality for Sims and Groups when moving them around in a filesystem, even when the relative paths between these Containers and the files they require haven't changed.
The relative paths should be tried in the event the absolute path fails. If the file is found, the absolute path should be updated.
I was playing with the selections tool on Sims, and I bumped into a couple annoyances
ag = u.atoms[:10]
S.selections.add(ag)
# TypeError: object name is not a string: <AtomGroup with 10 atoms>
I guess this ties in with #25 in accepting premade MDA objects. This should be possible as AtomGroups are unambiguous based on a hash of their Universe (to uniquely identify this among S.universes) and then just ag.indices()
S.selections += 'something'
# TypeError: unsupported operand type(s) for +=: 'Selections' and 'str'
Using .add()
feels weird to me, intuitively I want to +=
stuff in. Thoughts?
Here's a fun one for the test suite, MDSynthesis crashes when you open up scalar numpy arrays. This can easily be avoided by not storing scalars in the first place, but it's worth mentioning!
import mdsynthesis as mds
import numpy as np
s=mds.Sim("marlar")
s.data['harhar']=np.array(20)
s.data['harhar']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 960, in __getitem__
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 899, in inner
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 1154, in retrieve
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1504, in get_data
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1819, in inner
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1878, in get_data
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/cing/Projects/h5py/h5py/_objects.c:2458)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/cing/Projects/h5py/h5py/_objects.c:2415)
File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/dataset.py", line 418, in __getitem__
selection = sel2.select_read(fspace, args)
File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/selections2.py", line 92, in select_read
return ScalarReadSelection(fspace, args)
File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/selections2.py", line 80, in __init__
raise ValueError("Illegal slicing argument for scalar dataspace")
ValueError: Illegal slicing argument for scalar dataspace
MDSynthesis stores the scalars just fine and I don't see anything wrong with reading them using h5py as per usual:
import h5py
f = h5py.File('marlar/harhar/npData.h5','r')
f['main'].value
20
Since it's possible for name
and location
properties to change the path to a Container's statefile, these should obtain an exclusive lock on the statefile before being applied. A possible problem with this is that it requires an open file descriptor. What is a good solution?
Just a little better error handling is needed here, or well, you could support Sims with the same path name as files but that could get really confusing!
In shell:
touch marlar
then in Python,
from mdsynthesis import mds
s=mds.Sim("marlar")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/containers.py", line 512, in __init__
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/containers.py", line 224, in _regenerate
File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 65, in containerfile
UnboundLocalError: local variable 'statefileclass' referenced before assignment
I sometimes get several simulations of colleagues that are all in one folder
sim
+-- foo.01.xtc
+-- foo.02.xtc
+-- foo.03.xtc
+-- foo.pdb
My current approach is to move these into separate folders but I would like it I can also tell a Sim that there is more then one simulation in that folder. All simulations have exactly the same setup so I treat it as an ensemble in my analysis instead of a single simulation.
I know I just copy them to different directories and then create sims in them I'm just curious is such a workflow would be possible with mdsynthesis.
Can I convert a treant into a sim to keep all his tags and categories?
Having all modules start with uppercase looks a bit weird and is not the recommendation of PEP8:
(I know that e.g. MDAnalysis is also not fully compliant but this is a new project and you still have a chance to do it right without p***ing off too many users...)
At the moment the Data
aggregator doesn't allow iteration through datasets. This should be implemented. It should also allow accessing datasets with a list of data names.
Although Sims and Groups can be created and manipulated just fine with their built-in methods, it would be useful to start making some top-level functions that can do this as well, but potentially taking as input multiple Containers at once.
Some ideas:
mds.copy()
could copy stored elements of one container into another new or existing container. It could include keyword arguments to indicate what to include/leave out of the copy. It could also be made a method of Containers themselves, e.g. Sim.copy(sim, all=False, universes=None, ...)
mds.load(*containers)
could yield a mds.Bundle
object, which behaves like an ordered set of Containers. If a single container path given, will just spit out the loaded Container itself.
There is probably room for methods to add tags, categories, members, universes, and selections to any number of the appropriate Containers as convenience functions. These would also lend themselves to easily building a shell-level script for manipulating Containers and their data.
Since datasets are stored in a directory structure, and since their names reflect this, it would be fairly easy to make deletions using globbing. This would be a great convenience when some datasets matching a pattern should be removed without removing others.
Need tests for the following:
Upon loading an existing Container, the corresponding ContainerFile
subclass should do a version check between the version given by the state file and the current version of MDS. It should then run code that updates the schema to that used by the current version of MDS.
This will require:
ContainerFile
subclass, which can change each releaseBundle
is intended to be a useful grouping tool for Containers without the persistence of a Group
. It would be awesome if it could run any strings it receives as input through glob.glob
to grab whole sets of Containers from the filesystem easily.
HDF5 doesn't (yet) reclaim space from deleted nodes. Therefore, in principle SimFile
s will slowly grow if enough universe definitions / selections are added/replaced. We need to assess how big a problem this is, and what the best solution is for "cleaning" state files that have this potential for bloat.
So in trying to write tests for the Tags system, I've had some trouble with the design
currently it's something like
class ContainerFile:
def get_tags(self):
# do all the work on getting
def set_tags(self):
# do some other work on setting of tags
class Tags:
def __iter__(self):
return iter(self._containerfile.get_tags())
def add(self, things):
# do some work
self._containerfile.add_tags(processed_things)
So when getting/setting tags, the work is split between 2 classes... Ideally all the work should belong to the Tags
class, so something more like
class ContainerFile:
# doesn't have to know he has tags
class Tags:
def get_tags(self):
table = self._containerfile.handle.get_node()
So any aggregators plug into the container file, and use its API. ie I could write a new aggregator which wouldn't require modifying ContainerFile.
I'm not 100% sure on how this would work with all the file decorators, maybe generic Containerfile.read_table
and Containerfile.write_table
methods which Aggregators could use...
Maybe .get_read_handle()
which returns handle
with the _read_state
lock
Thoughts?
To allow quick random access to an XTC trajectory, MDAnalysis generates an index of its frames when prompted. This index can take several minutes to build, and if the Universe in questions is defined with multiple XTC files, it can take far longer. It would be incredibly useful if these indexes could be saved and recalled later.
The XTCReader in MDAnalysis can write and read these indices to and from disk, so the major question is how to implement such that a stale index is not applied to a trajectory that has changed. This will be where most of the work will come.
Addendum: File locking will need to apply here. The existing machinery for doing this (Core.Files
) should be used somehow.
It's very common for me to loop through all the members in a Group and apply some function to each one. It would be very useful to have a map method that takes a function as input and a parameter for how many processes to use to pool out the application of the function in parallel.
Bundles function somewhat as throwaway Groups, giving built-in methods for dealing with whole collections of Containers. It would be particularly pythonic if addition between Containers creates a Bundle, and addition between a Bundle and a Container adds that Container to the Bundle.
With MDS 0.5.1 the following fails:
import pandas as pd
import mdsynthesis as mds
df = pd.DataFrame({('R1', 'NZ1'): np.arange(3), ('R1', 'NZ2'): np.arange(3,0,-1),
('T2', 'OG1'): np.arange(3)*0.5,
('Q3', 'OE1'): np.arange(3)*2, ('Q3', 'OE1'): np.arange(3)*(-2),
})
sim = mds.Sim('boba')
sim.data.add('multi', df)
with the error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-105-5af881236448> in <module>()
----> 1 sim.data.add('multi', df)
/tmp/src/datreant/datreant/aggregators.py in inner(self, handle, *args, **kwargs)
609
610 try:
--> 611 out = func(self, handle, *args, **kwargs)
612 finally:
613 del self._datafile
/tmp/src/datreant/datreant/aggregators.py in add(self, handle, data)
688
689 """
--> 690 self._datafile.add_data('main', data)
691
692 def remove(self, handle, **kwargs):
/tmp/src/datreant/datreant/persistence.py in add_data(self, key, data)
1380 os.path.join(self.datadir, pydatafile), logger=self.logger)
1381
-> 1382 self.datafile.add_data(key, data)
1383
1384 # dereference
/tmp/src/datreant/datreant/persistence.py in inner(self, *args, **kwargs)
292 self.handle = self._open_file_w()
293 try:
--> 294 out = func(self, *args, **kwargs)
295 finally:
296 self.handle.close()
/tmp/src/datreant/datreant/persistence.py in add_data(self, key, data)
1567 self.handle.put(
1568 key, data, format='table', data_columns=True, complevel=5,
-> 1569 complib='blosc')
1570 except AttributeError:
1571 self.handle.put(
/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in put(self, key, value, format, append, **kwargs)
812 format = get_option("io.hdf.default_format") or 'fixed'
813 kwargs = self._validate_format(format, kwargs)
--> 814 self._write_to_group(key, value, append=append, **kwargs)
815
816 def remove(self, key, where=None, start=None, stop=None):
/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
1250
1251 # write the object
-> 1252 s.write(obj=value, append=append, complib=complib, **kwargs)
1253
1254 if s.is_table and index:
/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
3755 self.create_axes(axes=axes, obj=obj, validate=append,
3756 min_itemsize=min_itemsize,
-> 3757 **kwargs)
3758
3759 for a in self.axes:
/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
3357 axis, axis_labels = self.non_index_axes[0]
3358 data_columns = self.validate_data_columns(
-> 3359 data_columns, min_itemsize)
3360 if len(data_columns):
3361 mgr = block_obj.reindex_axis(
/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in validate_data_columns(self, data_columns, min_itemsize)
3220 if info.get('type') == 'MultiIndex' and data_columns:
3221 raise ValueError("cannot use a multi-index on axis [{0}] with "
-> 3222 "data_columns {1}".format(axis, data_columns))
3223
3224 # evaluate the passed data_columns, True == use all columns
ValueError: cannot use a multi-index on axis [1] with data_columns True
It is quite likely that this is a problem that I (or MDS) have with pandas --- any insights welcome.
Universes can now iterate through auxiliary data in addition to a trajectory. We want to persist defined auxiliaries in the Universe definition for a Sim in the same way that we persist other information, such as kwargs.
Although the keys for all data elements currently display by default using, e.g. Sim.data
, it would be useful to be able to get a listing of data keys that match a query. This could be as simple as making some kind of Data.isin
method that takes a string as input and outputs all keys that have that string present.
trying to import mdsynthesis
----> 1 import mdsynthesis as mds
build/bdist.linux-x86_64/egg/mdsynthesis/__init__.py in <module>()
build/bdist.linux-x86_64/egg/mdsynthesis/treants.py in <module>()
build/bdist.linux-x86_64/egg/mdsynthesis/limbs.py in <module>()
ImportError: No module named AtomGroup
Iterating through a Universes aggregator should yield keys. Also, need a keys()
method for the aggregator.
In the spirit of datreant/datreant#100, we want to get rid of the concept of treanttype
s and make all statefiles have names like Treant.<uuid>.json
. This has the benefit that tools such as datreant.cli can easily work with all Treants, not just those generated with datreant.core
. It also greatly simplifies the relationship between datreant
and libraries such as mdsynthesis
, allowing us to fix some annoying behavior.
To accomplish this consistently, mdsynthesis
must at least:
discover
method that only selects Treant
s that already feature the mdsynthesis
namespace in their state file, returning these as Sim
objects. This will come with a performance penalty since now the files must be parsed to check for this condition, whereas before the filename encoded this.Sim
object that, upon use on an existing Treant
file, creates the mdsynthesis
namespace marking it as a Sim
for later discovery.datreant
components on import, such as discover
or Bundle
, as it currently does.In order for this scheme to work consistently, datreant.core.Bundle
must be modified to not allow paths as input, but instead only take Treant
objects or their subclasses directly (otherwise it's not clear what class to use on the path). Must check that serialization and deserialization still works under this scheme.
Something that's become a clear problem from changes in upstream datreant.core
is the state Sim
instances carry with them, this being their "active" universe. The original idea for this was that for a given simulation one might have several different post-processed trajectories, and therefore perhaps different MDAnalysis.Universe
definitions along with their own selections. It's not a bad idea, but it means that you will get something different with different instances of the same Sim
when doing Sim.universe
depending on what you've done previously.
This is an issue especially when working with Bundle
s of Sim
s, since doing set-style operations between different Bundle
s with overlapping sets of Sim
s (but not the same instances) means that one has no guarantee they get the Sim
with the state they expect.
I propose to eliminate the "multiple Universes" functionality from Sim
objects. This will reduce the state of a Sim
to that stored in its state file (good!) and that stored in its loaded Universe
instance (not great, but with new features coming in upstream MDAnalysis, could be mitigated). It will also simplify the Sim
API greatly.
For those that use the "multiple Universes" functionality, (myself included), I think nesting Sim
objects might be a good general solution. Something like:
main_Sim
|-> Sim.<uuid>.json
|
|-> no_water
| |
| |->Sim.<uuid>.json
|
|-> fitted
|
|-> Sim.<uuid>.json
will work, and will be easy to use given work being done in upstream datreant.core
.
Thoughts?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.