Code Monkey home page Code Monkey logo

mdsynthesis's Introduction

MDSynthesis: a persistence engine for molecular dynamics data

Citation Documentation Status Build Status Code Coverage


Development of MDSynthesis has stopped. For new projects we recommend you use datreant directly.


As computing power increases, it is now possible to produce hundreds of molecular dynamics simulation trajectories that vary widely in length, system size, composition, starting conditions, and other parameters. Managing this complexity in ways that allow use of the data to answer scientific questions has itself become a bottleneck. MDSynthesis is an answer to this problem.

Built on top of datreant, MDSynthesis gives a Pythonic interface to molecular dynamics trajectories using MDAnalysis, giving the ability to work with the data from many simulations scattered throughout the filesystem with ease. It makes it possible to write analysis code that can work across many varieties of simulation, but even more importantly, MDSynthesis allows interactive work with the results from hundreds of simulations at once without much effort.

Efficiently store intermediate data from individual simulations for easy recall

The MDSynthesis Sim object gives an interface to raw simulation data through MDAnalysis. Data structures generated from raw trajectories (pandas objects, numpy arrays, or any pure python structure) can then be stored and easily recalled later. Under the hood, datasets are stored in the efficient HDF5 format when possible.

Powered by datreant under the hood

MDSynthesis is built on top of the general-purpose datreant library. The Sim is a Treant with special features for working with molecular dynamics data, but every feature of datreant applies to MDSynthesis.

Documentation

A brief user guide is available on Read the Docs.

Contributing

This project is still under heavy development, and there are certainly rough edges and bugs. Issues and pull requests welcome!

MDSynthesis follows the development model of datreant; see the contributor's guide to learn how to get started with contributing back.

mdsynthesis's People

Contributors

dotsdl avatar kaceyaurum avatar kain88-de avatar orbeckst avatar richardjgowers avatar sseyler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdsynthesis's Issues

Storing 1D DataFrame gives Pandas TypeError on retrieval

I'm not exactly sure what's going on here, but I can't store a 1D DataFrame. Retrieving a (10,1) DataFrame gives the error "TypeError: Index(...) must be called with a collection of some kind, None was passed":

import mdsynthesis as mds                                                        
import numpy as np                                                               
import pandas as pd                                                              

s = mds.Sim('marklar')                                                           
s.data.add('test1',pd.DataFrame(np.zeros((1,10))))                                                            
print s.data["test1"]        #Good

s.data.add('test2',pd.Series(np.zeros((10,))))                                                                        
print s.data["test2"]        #Good

s.data.add('test3',pd.DataFrame(np.zeros((10,10))))                                                              
print s.data["test3"]        #Good

s.data.add('test4',pd.DataFrame(np.zeros((10,1))))                                                              
print s.data["test4"]        #Not Good

I'm pretty sure this is actually a problem with writing the file to disk because the h5 file seems weird:

z=pd.HDFStore("test4/pdData.h5", 'r')
>>> z
<class 'pandas.io.pytables.HDFStore'>
File path: pdData.h5
/main            [invalid_HDFStore node: sequence item 0: expected string, numpy.int64 found]

Any idea how to fix this?

Add a 'topologies' and 'trajectories' properties.

We need convenient mechanisms modifying universe definitions. One way to do this would be to have Universes.topology and Universes.trajectory properties that allow getting and setting of the corresponding elements. If it isn't too confusing, could include __setitem__ and __getitem__ to the underlying objects that allow setting and getting for any universe definition.

Make common convenience methods for aggregating data for ``Group.members`` and ``Bundle``.

Group.members and Bundle are intended to make it easy to manipulate many Containers at once, but currently they only give access to the objects themselves. It would be useful to include methods that yield aggregate information from these collections. Both objects would have these methods in common.

For example, could have

Bundle.data, which gives access to concatenations of stored pandas data sets. It grabs any datasets it can that match the handle given, and tries to concatenate them. Would be useful for quickly aggregating and manipulating ensemble data.

Bundle.tags, which gives all tags present in the collection. Could have keywords for any and all criterion for what to return.

Need mechanism for updating file schema from previous versions.

Upon loading an existing Container, the corresponding ContainerFile subclass should do a version check between the version given by the state file and the current version of MDS. It should then run code that updates the schema to that used by the current version of MDS.

This will require:

  1. an explicitly-documented schema spec for each ContainerFile subclass, which can change each release
  2. a mechanism for performing iterative updates to existing files; in other words, a file that was made with a very old version of MDS gets schema updates for each version of MDS it is behind, until it reaches the current one
  3. the mechanism to update files needs to be stress-tested with its own set of unit tests; it should be robust enough to restart conversion of a file that is only half-converted to a new version, which might happen if the python session dies mid-conversion.

module naming not PEP8 compliant

Having all modules start with uppercase looks a bit weird and is not the recommendation of PEP8:

  • "Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged."
  • "Class names should normally use the CapWords convention. "
  • "Function names should be lowercase, with words separated by underscores as necessary to improve readability."

(I know that e.g. MDAnalysis is also not fully compliant but this is a new project and you still have a chance to do it right without p***ing off too many users...)

Add simple query for data elements in a Container.

Although the keys for all data elements currently display by default using, e.g. Sim.data, it would be useful to be able to get a listing of data keys that match a query. This could be as simple as making some kind of Data.isin method that takes a string as input and outputs all keys that have that string present.

Crashes when opening scalar numpy arrays

Here's a fun one for the test suite, MDSynthesis crashes when you open up scalar numpy arrays. This can easily be avoided by not storing scalars in the first place, but it's worth mentioning!

import mdsynthesis as mds
import numpy as np
s=mds.Sim("marlar")
s.data['harhar']=np.array(20)
s.data['harhar']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 960, in __getitem__
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 899, in inner
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/aggregators.py", line 1154, in retrieve
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1504, in get_data
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1819, in inner
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 1878, in get_data
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/Users/cing/Projects/h5py/h5py/_objects.c:2458)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/Users/cing/Projects/h5py/h5py/_objects.c:2415)
  File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/dataset.py", line 418, in __getitem__
    selection = sel2.select_read(fspace, args)
  File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/selections2.py", line 92, in select_read
    return ScalarReadSelection(fspace, args)
  File "/Users/cing/anaconda/lib/python2.7/site-packages/h5py-2.5.0-py2.7-macosx-10.5-x86_64.egg/h5py/_hl/selections2.py", line 80, in __init__
    raise ValueError("Illegal slicing argument for scalar dataspace")
ValueError: Illegal slicing argument for scalar dataspace

MDSynthesis stores the scalars just fine and I don't see anything wrong with reading them using h5py as per usual:

import h5py
f = h5py.File('marlar/harhar/npData.h5','r')
f['main'].value
20

Design of ContainerFile / Tags system

So in trying to write tests for the Tags system, I've had some trouble with the design

currently it's something like

class ContainerFile:
    def get_tags(self):
        # do all the work on getting

    def set_tags(self):
        # do some other work on setting of tags

class Tags:
    def __iter__(self):
        return iter(self._containerfile.get_tags())

    def add(self, things):
        # do some work
        self._containerfile.add_tags(processed_things)

So when getting/setting tags, the work is split between 2 classes... Ideally all the work should belong to the Tags class, so something more like

class ContainerFile:
    # doesn't have to know he has tags

class Tags:
    def get_tags(self):
        table = self._containerfile.handle.get_node()

So any aggregators plug into the container file, and use its API. ie I could write a new aggregator which wouldn't require modifying ContainerFile.

I'm not 100% sure on how this would work with all the file decorators, maybe generic Containerfile.read_table and Containerfile.write_table methods which Aggregators could use...

Maybe .get_read_handle() which returns handle with the _read_state lock

Thoughts?

Remove Sim treanttype

In the spirit of datreant/datreant#100, we want to get rid of the concept of treanttypes and make all statefiles have names like Treant.<uuid>.json. This has the benefit that tools such as datreant.cli can easily work with all Treants, not just those generated with datreant.core. It also greatly simplifies the relationship between datreant and libraries such as mdsynthesis, allowing us to fix some annoying behavior.

To accomplish this consistently, mdsynthesis must at least:

  1. Feature its own discover method that only selects Treants that already feature the mdsynthesis namespace in their state file, returning these as Sim objects. This will come with a performance penalty since now the files must be parsed to check for this condition, whereas before the filename encoded this.
  2. Have a Sim object that, upon use on an existing Treant file, creates the mdsynthesis namespace marking it as a Sim for later discovery.
  3. Not change the behavior of any datreant components on import, such as discover or Bundle, as it currently does.

In order for this scheme to work consistently, datreant.core.Bundle must be modified to not allow paths as input, but instead only take Treant objects or their subclasses directly (otherwise it's not clear what class to use on the path). Must check that serialization and deserialization still works under this scheme.

Pickled Python data structures cannot be modified in append mode

The problem is pretty straight forward

import mdsynthesis as mds
s=mds.Sim("marlar")
s.data['poop']=25
s.data['poop']
25
s.data['poop']=50
s.data['poop']
25
s.data.add('poop', 50)
s.data['poop']
25
s.data['poop2']=[1,2]
s.data['poop2']
[1, 2]
s.data['poop2']=[1,2,3]
s.data['poop2']
[1, 2]

I dug a little into this one and the cause of the issue seems to be the file mode ab+, it straight up doesn't work for pickle dumping (see code on mdsynthesis/core/persistence.py:1962 where you use that mode). I can submit a one-line PR that changes the mode to "wb+" if there's no reason not to.

Here's the MDSynthesis-free root of the problem:

import pickle
pickle.dump(20, open("marlar.pkl", "ab+"))
pickle.load(open("marlar.pkl","rb"))
20
pickle.dump(30, open("marlar.pkl", "ab+"))
pickle.load(open("marlar.pkl","rb"))
20
pickle.dump(30, open("marlar.pkl", "wb"))
pickle.load(open("marlar.pkl","rb"))
30

Make name and location setters get exclusive lock

Since it's possible for name and location properties to change the path to a Container's statefile, these should obtain an exclusive lock on the statefile before being applied. A possible problem with this is that it requires an open file descriptor. What is a good solution?

Cryptic error on opening Sim with conflicting name with file in working directory

Just a little better error handling is needed here, or well, you could support Sims with the same path name as files but that could get really confusing!

In shell:

touch marlar

then in Python,

from mdsynthesis import mds
s=mds.Sim("marlar")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/containers.py", line 512, in __init__
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/containers.py", line 224, in _regenerate
  File "build/bdist.macosx-10.5-x86_64/egg/mdsynthesis/core/persistence.py", line 65, in containerfile
UnboundLocalError: local variable 'statefileclass' referenced before assignment

Allow globbing syntax for Data.remove().

Since datasets are stored in a directory structure, and since their names reflect this, it would be fairly easy to make deletions using globbing. This would be a great convenience when some datasets matching a pattern should be removed without removing others.

failed to store DataFrame with column multi-index

With MDS 0.5.1 the following fails:

import pandas as pd
import mdsynthesis as mds

df = pd.DataFrame({('R1', 'NZ1'): np.arange(3), ('R1', 'NZ2'): np.arange(3,0,-1),
                   ('T2', 'OG1'): np.arange(3)*0.5,
                  ('Q3', 'OE1'): np.arange(3)*2, ('Q3', 'OE1'): np.arange(3)*(-2),
                  })

sim = mds.Sim('boba')
sim.data.add('multi', df)

with the error


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-105-5af881236448> in <module>()
----> 1 sim.data.add('multi', df)

/tmp/src/datreant/datreant/aggregators.py in inner(self, handle, *args, **kwargs)
    609 
    610             try:
--> 611                 out = func(self, handle, *args, **kwargs)
    612             finally:
    613                 del self._datafile

/tmp/src/datreant/datreant/aggregators.py in add(self, handle, data)
    688 
    689         """
--> 690         self._datafile.add_data('main', data)
    691 
    692     def remove(self, handle, **kwargs):

/tmp/src/datreant/datreant/persistence.py in add_data(self, key, data)
   1380                 os.path.join(self.datadir, pydatafile), logger=self.logger)
   1381 
-> 1382         self.datafile.add_data(key, data)
   1383 
   1384         # dereference

/tmp/src/datreant/datreant/persistence.py in inner(self, *args, **kwargs)
    292                 self.handle = self._open_file_w()
    293                 try:
--> 294                     out = func(self, *args, **kwargs)
    295                 finally:
    296                     self.handle.close()

/tmp/src/datreant/datreant/persistence.py in add_data(self, key, data)
   1567             self.handle.put(
   1568                 key, data, format='table', data_columns=True, complevel=5,
-> 1569                 complib='blosc')
   1570         except AttributeError:
   1571             self.handle.put(

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in put(self, key, value, format, append, **kwargs)
    812             format = get_option("io.hdf.default_format") or 'fixed'
    813         kwargs = self._validate_format(format, kwargs)
--> 814         self._write_to_group(key, value, append=append, **kwargs)
    815 
    816     def remove(self, key, where=None, start=None, stop=None):

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in _write_to_group(self, key, value, format, index, append, complib, encoding, **kwargs)
   1250 
   1251         # write the object
-> 1252         s.write(obj=value, append=append, complib=complib, **kwargs)
   1253 
   1254         if s.is_table and index:

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in write(self, obj, axes, append, complib, complevel, fletcher32, min_itemsize, chunksize, expectedrows, dropna, **kwargs)
   3755         self.create_axes(axes=axes, obj=obj, validate=append,
   3756                          min_itemsize=min_itemsize,
-> 3757                          **kwargs)
   3758 
   3759         for a in self.axes:

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in create_axes(self, axes, obj, validate, nan_rep, data_columns, min_itemsize, **kwargs)
   3357             axis, axis_labels = self.non_index_axes[0]
   3358             data_columns = self.validate_data_columns(
-> 3359                 data_columns, min_itemsize)
   3360             if len(data_columns):
   3361                 mgr = block_obj.reindex_axis(

/usr/local/lib/python2.7/dist-packages/pandas/io/pytables.pyc in validate_data_columns(self, data_columns, min_itemsize)
   3220         if info.get('type') == 'MultiIndex' and data_columns:
   3221             raise ValueError("cannot use a multi-index on axis [{0}] with "
-> 3222                              "data_columns {1}".format(axis, data_columns))
   3223 
   3224         # evaluate the passed data_columns, True == use all columns

ValueError: cannot use a multi-index on axis [1] with data_columns True

It is quite likely that this is a problem that I (or MDS) have with pandas --- any insights welcome.

Split off MDSynthesis core into another package?

Since the basic structure of MDSynthesis isn't entirely specific to molecular dynamics simulation, I think it might make sense to break the core out into a separate package. I'd been considering this for a while, but today I was contacted by someone with a use-case outside of MD, and I think it would be of benefit to MDSynthesis' core development to make it more general.

I've started a repository for this work here: https://github.com/dotsdl/datreant

Is anyone opposed to this? One thing to to note is that datreant is BSD 3-clause licensed, making its use more permissive than MDSynthesis, which is GPLv2 (same as MDAnalysis). I technically need permission from everyone that has contributed code so far to make this change.

P.S. Opinions on the name are welcome. I wanted something that included 'dat' to indicate 'data', and some kind of word to indicate 'trees' (as in directory trees). An old D&D woodland creature ('treant') came to mind. :D

Need stress tests for file locking / concurrency

Despite the advantages of structuring Containers with their stored data at the filesystem level instead of shoving them into a single database, the major disadvantage is that we have to be careful that file locks are working to ensure against concurrency corruption. We need stress tests of the file locking mechanisms for:

  • ContainerFile and its child classes
  • DataFile variants

We also need to test that changes in filesystem elements that occur through Container properties (e.g. #19) are also handled gracefully, though this will almost certainly require the underlying ContainerFile object to use the Foxhound to fetch its new path.

``Group.members`` should also allow indexing by member name

Currently, Group.members[:] yields a list of all members, and slicing by index is allowed to get lists of a subset since members do have an order. However, it would be most convenient to be able select members by name, such as with::

Group.members['lark']

or::

Group.members[['lark', 'hark']]

as is used by pandas DataFrames to select multiple columns. This should then yield a Bundle object instead of a list containing all members that matched the names given. Since names are not required to be unique, this could be many more than the number of names supplied. This would allow calls such as::

Group.members[['lark', 'hark']].data['ionbinding']

Which would retrieve a concatenation of all concatenatable datasets present in the Bundle matching the name 'ionbinding' once Issue #8 is addressed.

Need test coverage of Sims and Groups.

Pytest is our testing framework of choice, and some basic tests have been written for Container elements (tags and categories), but these need to be expanded to cover all components of Sims and Groups. This includes:

  • universes
  • selections
  • members
  • renaming and re-locating
  • data (pandas, numpy, and pure-python objects)

Additional, less atomic tests can be added later. These may include, e.g., organizational patterns:

  • does everything work when Containers are nested (in another Container's tree)?
  • does Foxhound find a Container that has been moved? Does it do it reasonably quickly?

make a 1.0 release?

The docs still state that this is alpha software but I think in truth it has been in stable use for more than a year. Isn't it time to slap a 1.0 on it?

(Or are we waiting for MDAnalysis 0.16.0?)

Make Sims and Groups read-only usable

It may be useful for others in a lab group on a shared volume to be able to use Sims and Groups of others with read permissions but without write permissions. If Sims and Groups can generally be made to work with only read permissions for accessing their stored attributes/data, this would make this possible.

The main changes would come in the __init__ methods of each container.

raise KeyError instead of NoSuchNodeError

I like using selections (and everything else) in a pythonic fashion (i.e. if it looks like a dict it should mostly behave like one even if under the hood it's all HDF5). One strength of MDS is hiding all the bookkeeping.

Therefore, it is annoying if a non-existent, say, selection raises NoSuchNodeError (no idea what kind of exception this is) when I tried

try:
   sel = self.sim.selections[name]
except KeyError:
   # do something about it because selection 'name' is not stored in the sim

because from the syntax I expected to get a KeyError.

Add globbing syntax to Bundle.

Bundle is intended to be a useful grouping tool for Containers without the persistence of a Group. It would be awesome if it could run any strings it receives as input through glob.glob to grab whole sets of Containers from the filesystem easily.

Add top-level functions for manipulating/making Sims and Groups

Although Sims and Groups can be created and manipulated just fine with their built-in methods, it would be useful to start making some top-level functions that can do this as well, but potentially taking as input multiple Containers at once.

Some ideas:

mds.copy() could copy stored elements of one container into another new or existing container. It could include keyword arguments to indicate what to include/leave out of the copy. It could also be made a method of Containers themselves, e.g. Sim.copy(sim, all=False, universes=None, ...)

mds.load(*containers) could yield a mds.Bundle object, which behaves like an ordered set of Containers. If a single container path given, will just spit out the loaded Container itself.

There is probably room for methods to add tags, categories, members, universes, and selections to any number of the appropriate Containers as convenience functions. These would also lend themselves to easily building a shell-level script for manipulating Containers and their data.

Make addition of Containers yield a Bundle

Bundles function somewhat as throwaway Groups, giving built-in methods for dealing with whole collections of Containers. It would be particularly pythonic if addition between Containers creates a Bundle, and addition between a Bundle and a Container adds that Container to the Bundle.

Failing in adding a Universe creates bad state

In [1]: import mdsynthesis as mds

In [2]: S = mds.Sim('cg')

In [5]: S.universes.add('main', ['topol.tpr','cg.xtc'])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-f7469218f877> in <module>()
----> 1 S.universes.add('main', ['topol.tpr','cg.xtc'])

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/aggregators.py in add(self, handle, topology, *trajectory)
    333                 outtraj.append(traj)
    334 
--> 335         self._backend.add_universe(handle, topology, *outtraj)
    336 
    337         if not self.default():

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in inner(self, *args, **kwargs)
    224             self._exlock(self.handle)
    225             try:
--> 226                 out = func(self, *args, **kwargs)
    227             finally:
    228                 self.handle.close()

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in add_universe(self, universe, topology, *trajectory)
    880 
    881         # add topology paths to table
--> 882         table.row['abspath'] = os.path.abspath(topology)
    883         table.row['relCont'] = os.path.relpath(topology, self.get_location())
    884         table.row.append()

/usr/lib/python2.7/posixpath.pyc in abspath(path)
    365 def abspath(path):
    366     """Return an absolute path."""
--> 367     if not isabs(path):
    368         if isinstance(path, _unicode):
    369             cwd = os.getcwdu()

/usr/lib/python2.7/posixpath.pyc in isabs(s)
     59 def isabs(s):
     60     """Test whether a path is absolute"""
---> 61     return s.startswith('/')
     62 
     63 

AttributeError: 'list' object has no attribute 'startswith'

In [6]: S.universes.add('main', 'topol.tpr','cg.xtc')
---------------------------------------------------------------------------
NoSuchNodeError                           Traceback (most recent call last)
<ipython-input-6-f176da1cef98> in <module>()
----> 1 S.universes.add('main', 'topol.tpr','cg.xtc')

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/aggregators.py in add(self, handle, topology, *trajectory)
    333                 outtraj.append(traj)
    334 
--> 335         self._backend.add_universe(handle, topology, *outtraj)
    336 
    337         if not self.default():

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in inner(self, *args, **kwargs)
    224             self._exlock(self.handle)
    225             try:
--> 226                 out = func(self, *args, **kwargs)
    227             finally:
    228                 self.handle.close()

/home/richard/.local/lib/python2.7/site-packages/mdsynthesis-0.5.0-py2.7.egg/mdsynthesis/core/persistence.py in add_universe(self, universe, topology, *trajectory)
    872                 '/universes/{}'.format(universe), 'topology')
    873             self.handle.remove_node(
--> 874                 '/universes/{}'.format(universe), 'trajectory')
    875 
    876         # construct topology table

/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in remove_node(self, where, name, recursive)
   1779         """
   1780 
-> 1781         obj = self.get_node(where, name=name)
   1782         obj._f_remove(recursive)
   1783 

/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in get_node(self, where, name, classname)
   1614         # Now we have the definitive node path, let us try to get the node.
   1615         if node is None:
-> 1616             node = self._get_node(nodepath)
   1617 
   1618         # Finally, check whether the desired node is an instance

/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in _get_node(self, nodepath)
   1553             return self.root
   1554 
-> 1555         node = self._node_manager.get_node(nodepath)
   1556         assert node is not None, "unable to instantiate node ``%s``" % nodepath
   1557 

/home/richard/.local/lib/python2.7/site-packages/tables/file.pyc in get_node(self, key)
    434 
    435         if self.node_factory:
--> 436             node = self.node_factory(key)
    437             self.cache_node(node, key)
    438 

/home/richard/.local/lib/python2.7/site-packages/tables/group.pyc in _g_load_child(self, childname)
   1184             childname = join_path(self._v_file.root_uep, childname)
   1185         # Is the node a group or a leaf?
-> 1186         node_type = self._g_check_has_child(childname)
   1187 
   1188         # Nodes that HDF5 report as H5G_UNKNOWN

/home/richard/.local/lib/python2.7/site-packages/tables/group.pyc in _g_check_has_child(self, name)
    400             raise NoSuchNodeError(
    401                 "group ``%s`` does not have a child named ``%s``"
--> 402                 % (self._v_pathname, name))
    403         return node_type
    404 

NoSuchNodeError: group ``/`` does not have a child named ``/universes/main/trajectory``

Sim should take a real MDAnalysis.Universe instance, too

It would be convenient (and more object oriented) if I could say

u = MDAnalysis.Universe(TPR, XTC)
s = Sim(name)
s.universes.add('anyname', u)

instead of s.universes.add('anyname', universe=[TPR, XTC, ...]).

Although I appreciate that this will make it more difficult to recreate the universe. Perhaps should be tackled together with MDAnalysis/mdanalysis#173 , which could supply the necessary state information to reconstitute the universe.

handle folders with several simulations

I sometimes get several simulations of colleagues that are all in one folder

sim
+-- foo.01.xtc
+-- foo.02.xtc
+-- foo.03.xtc
+-- foo.pdb

My current approach is to move these into separate folders but I would like it I can also tell a Sim that there is more then one simulation in that folder. All simulations have exactly the same setup so I treat it as an ensemble in my analysis instead of a single simulation.

I know I just copy them to different directories and then create sims in them I'm just curious is such a workflow would be possible with mdsynthesis.

Using Sim selections

I was playing with the selections tool on Sims, and I bumped into a couple annoyances

ag = u.atoms[:10]

S.selections.add(ag)

# TypeError: object name is not a string: <AtomGroup with 10 atoms>

I guess this ties in with #25 in accepting premade MDA objects. This should be possible as AtomGroups are unambiguous based on a hash of their Universe (to uniquely identify this among S.universes) and then just ag.indices()

S.selections += 'something'

# TypeError: unsupported operand type(s) for +=: 'Selections' and 'str'

Using .add() feels weird to me, intuitively I want to += stuff in. Thoughts?

The current release doesn't work with MDAnalysis 0.16.0dev

trying to import mdsynthesis

----> 1 import mdsynthesis as mds

build/bdist.linux-x86_64/egg/mdsynthesis/__init__.py in <module>()

build/bdist.linux-x86_64/egg/mdsynthesis/treants.py in <module>()

build/bdist.linux-x86_64/egg/mdsynthesis/limbs.py in <module>()

ImportError: No module named AtomGroup

Build in mechanism to state files to reclaim space

HDF5 doesn't (yet) reclaim space from deleted nodes. Therefore, in principle SimFiles will slowly grow if enough universe definitions / selections are added/replaced. We need to assess how big a problem this is, and what the best solution is for "cleaning" state files that have this potential for bloat.

Add map method to Group.members (and Bundle)

It's very common for me to loop through all the members in a Group and apply some function to each one. It would be very useful to have a map method that takes a function as input and a parameter for how many processes to use to pool out the application of the function in parallel.

Universes and members should be locateable by relative paths.

At the moment, Universes and Members use only stored absolute paths for finding the files they need to generate universes and members, respectively. Relative paths from the Container's basedir are also stored, but they are not yet used. This breaks functionality for Sims and Groups when moving them around in a filesystem, even when the relative paths between these Containers and the files they require haven't changed.

The relative paths should be tried in the event the absolute path fails. If the file is found, the absolute path should be updated.

Data aggregator doesn't have keys method

Sim.selections.keys() # works
Sim.categories.keys() # works

# but
Sim.data.keys() # doesn't work
Sim.universes.keys() # doesn't work

If I'm using a dict like object, it makes sense to have a keys method for this. I had a quick look through the code, and it looks like the data aggregator is different, but ideally it should behave like the others? Maybe link keys to data._list?

Store MDAnalysis XTCReader index for XTC files.

To allow quick random access to an XTC trajectory, MDAnalysis generates an index of its frames when prompted. This index can take several minutes to build, and if the Universe in questions is defined with multiple XTC files, it can take far longer. It would be incredibly useful if these indexes could be saved and recalled later.

The XTCReader in MDAnalysis can write and read these indices to and from disk, so the major question is how to implement such that a stale index is not applied to a trajectory that has changed. This will be where most of the work will come.

Addendum: File locking will need to apply here. The existing machinery for doing this (Core.Files) should be used somehow.

renaming universes?

I haven't found a way to rename a universe --- am I missing something? I was looking for

Sim.universes.rename(old, new)

or at least

Sim.universes[new] = Sim.universes[old]
del Sim.universes[old]

One Sim, one Universe: remove multiple Universes functionality

Something that's become a clear problem from changes in upstream datreant.core is the state Sim instances carry with them, this being their "active" universe. The original idea for this was that for a given simulation one might have several different post-processed trajectories, and therefore perhaps different MDAnalysis.Universe definitions along with their own selections. It's not a bad idea, but it means that you will get something different with different instances of the same Sim when doing Sim.universe depending on what you've done previously.

This is an issue especially when working with Bundles of Sims, since doing set-style operations between different Bundles with overlapping sets of Sims (but not the same instances) means that one has no guarantee they get the Sim with the state they expect.

I propose to eliminate the "multiple Universes" functionality from Sim objects. This will reduce the state of a Sim to that stored in its state file (good!) and that stored in its loaded Universe instance (not great, but with new features coming in upstream MDAnalysis, could be mitigated). It will also simplify the Sim API greatly.

For those that use the "multiple Universes" functionality, (myself included), I think nesting Sim objects might be a good general solution. Something like:

main_Sim
    |-> Sim.<uuid>.json
    |
    |-> no_water
    |   |
    |   |->Sim.<uuid>.json
    |
    |-> fitted
        |
        |-> Sim.<uuid>.json

will work, and will be easy to use given work being done in upstream datreant.core.

Thoughts?

Allow globbing syntax for universe definitions.

When multiple trajectories are needed for a universe definition, it may be useful to allow globbing syntax for the paths stored so that new trajectory files that match the pattern are picked up by the universe. Then again, this feature could be troublesome. How feasible is it to include, and what are some pitfalls?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.