An open source Python framework for transition interface and path sampling calculations.
See documentation here:
An open source Python framework for transition interface and path sampling calculations.
Home Page: http://openpathsampling.org
License: MIT License
An open source Python framework for transition interface and path sampling calculations.
See documentation here:
We wanted to use mdtraj to store the topology, but it does not contain periodic boundaries for creating an openmm system that has periodic boundaries. Openmm.Topology has, but in converting from openmm to mdtraj the boundaries get lost.
We could add these boundaries by using
om_topology = md_topology.to_openmm()
om_topology.setUnitCellDimensions(dimensions)
We need some place to get the dimensions from, maybe the box_vectors from a snapshot. But we need to have these available for the simulation when it creates the system object. openmm solves this by sticking non-topology information into the topology object, which I would like to avoid.
Also since we are using mdtraj mostly for analysis we want to keep the md.topology
as the main topology object.
My question would be where to store this extra information. I would think that we might store one configuration, either under a fixed ID or pointing to a configuration. When the storage is created the
topology is also stored and we could just add a snapshot (while a pdb already contains all we need!)
and point to it.
Any suggestions?
I have finished a simple JSON Serializer for mdtraj.Topology that also includes missing elements and restores from a saved JSON. So this can be ready soon
will add storage capabilities for Toy Dynamics and create a Topology object for this as well
Would be great to get at least some basic nose tests going here to test the various modules!
It would be great if files like this one were actually README.md
markdown text files:
https://github.com/choderalab/msm-tis/blob/master/code/python/PySRMSTIS/src/tutorial_sheet.pdf
Path trees of the sort generated by showtree.py
have mainly been used for TPS paths, not TIS paths. To be really useful for TIS paths, which can end in either state A or state B (or state C, D, etc., in these days of multiple state TIS), we need to know which states the paths connect.
Visually, my suggestion is to use a circle (or maybe rounded-rect); either of a color that corresponds to a given state, or putting a string (which would have to be maybe 3 or 4 chars max) corresponding to a given state in the circle. I think the string will look better. This would also mean that the user would have to provide a dictionary of Volume-to-color/name for the states. But that might not be so hard if the state booleans are saved as labelled order parameters in the storage... then the user just needs to specify which names are states, perhaps with a short version for the strings?
@jhprinz , showtree
is your baby; would this be terribly hard to implement?
I’m starting to set up logging. This my first time playing with Python’s logging tools, which are quite nice, and I’m doing it now for a few reasons:
PathMover
s and Calculation
s, we’ll be doing a lot of debug-by-print. This is, of course, what logging is really intended for, and including logging statements makes it easier if we have to check the same part of code more than once.Anyway, the first of those is the main reason I’d like to do this now. The others are useful bonuses we get by properly setting up logging.
Here’s my suggestion for the logger layout:
opentis.initialization
which logs all initialization parameters (handling the second bullet point above) and logs them at level "info"logging.conf
(or ops_logging.conf
) file. Default configuration is initialization
logs info messages to a file initialization.log
, and the module-level initializers log info messages to stdout and ignore debug-level messages. (At least, Calculation
and PathMover
info messages should go to stdout, maybe not everything else: all of this is easily configured in the logging.conf
file)logging.conf
file -- although it puts more work on the user (2 lines of code), everything I’ve read suggests that this is the best practice when writing a framework: the default behavior without the configuration file will be that all log messages are dropped.Thoughts or suggestions? I've got proof of concept pretty much working on this, it's just a matter of adding messages and finding the right default behavior now.
Will make it easier for me to read!
Working to get a first running prototype to start from. There are still a few things that I need to finish:
use XMLSerializer in netCDF storage to better restart
add snapshots with reversed momenta by either
for all inversed frames store additional data like change in energies...
a negative index might be problematic since negative indices in python do not trigger an error but might read wrong frames instead. An additional bool might be overlooked in an implementation but reads at least the correct frame.
from an implementational point of view this translates into the question if an inversed snapshot is really a new object or just a different view. Also we want to be able to read frames fast
are there other changes to snapshots besides momenta inversion? Maybe consider a trajectory for another ensemble like with changed pressure or temperature. In this case it might be good to decouple saving a snapshot and its momenta and positions. This means
- a trajectory is a list of snapshot IDs
- a snapshot has its ID and links to a position ID and/or a momenta ID. This would allow to store snapshots without momenta or generate the same configuration with differerent momenta.
- an indicator if momenta should be reversed (optional - this should be part of a trajectory, but if we wanted to really create a new snapshot then we could do that)
- a snapshot also has additional information like box vectors, energies, etc...
- since in all shooting simulations we have forward and backward trajectories we should store the time direction of each frame in a boolean. Need to think about this.
setup a well documented example for Alanine Dipeptide
rewrite the usage of the KMediod Clustering
include MSM generation
improve documentation
Coule we change the code so that it runs also under Python 3?
I had the idea to add a Siegtank converter which would allow to import a complete Siegetank project into our netCDF Storage and continuously sync it, as it progresses.
The idea was to just get the actual status of a siegetank project. Get all the trajectories one by one, strip water and store the essential part (I assume that I don't need water and or momenta) in our storage.
Unfortunately Siegetank requires Python 3 and to avoid having to seperate scripts it would be great to keep out code also runnable (if not optimal) under Python 3.
This seems to involve mainly
print
to print()
which is not too much work. Most of it can be done automatically.
Would that be a good goal? I mean importing from Siegetank is a nice gimmick but potentially not relevent unless we want to use Siegetank at some point to run our trajectories (which would be compatible withe the next simulation generator scheme)
Would be helpful for readability.
Just to collect ideas separately for a while...
About the name. So this package effective allows to sample ensembles of paths. So something with these three words?
Lots of paths form a street (Autobahn) a map
Cartography?
The examples/
directory could use some reorganizing. Right now, the contents are
alanine/
data/
ipython/
openmm/
Perhaps we could add examples/README.md
as a master index of examples, and then have more meaningful subdirectory names that represent the nature of the system and dynamics under study?
We now have a copy of integrators
in openmmtools.integrators
. Should we use that instead of the integrators.py
here? We have to make some changes to the variable names anyway to avoid potential bugs with CustomForce
systems, so it might be easiest to only maintain one set of these files.
On the other hand, if there's the expectation that a lot of new custom integrators that would not be of general interest might be added, it may be useful to keep a local integrators.py
. But for now, this should probably be avoided.
Might be a good time to do this, since we're getting everything set up with http://openpathsampling.org
This is one of those issues which won’t cause any problems in practice, but should be handled if we’re going to claim that our code can handle any general path-sampling-like approach. I’m adding a tag "noturgent" to mark this and other questions like it. (Until we solve these, our code isn’t as general-purpose as we claim, but these bugs won’t break any likely approach.)
Depending on the nature of the continue conditions (which, in turn, typically depends on the nature of the ensemble), we can get one of two types of paths from DynamicsEngine.generate
. To describe the problem, think in the context of having a continue condition based on Ensemble.can_append
for some ensemble:
LengthEnsemble
requirements at the end (which is how we’re setting up our TIS ensemble, and all the ensembles I can think of from path sampling approaches can at least be written that way), then a path which could be accepted is exactly the path which is generated by DynamicsEngine.generate()
LengthEnsemble
requirement but does have a way to stop (e.g., if it ends with an InXEnsemble
), then we have to overshoot before can_append
will return false. In this case, generate
returns a trajectory which has one frame too many; we must trim that trajectory in order to accept it.I see three possible solutions to this:
generate
must, if the trajectory returned by generate
isn't acceptable, also check the trajectory with one fewer frame, to see if we overshot. This is easy to implement, but it is an ugly approach because it requires the PathMover
to modify the trajectory, which really should be the job of the DynamicsEngine
. It also requires more work on the part of contributors who might add PathMover
s.can_append
into two functions: one that really is can_append
, and one that is not_overshot
. The dynamics_engine
would have to know whether it had to stop because it can’t append another trajectory, or whether the stop was because of an overshoot. If it overshot, then it would remove the extra frame and return the pre-overshoot trajectory. I think this is by far the best solution; the only problem is that it isn’t trivial to implement. If we agree that this is the way to go, I have a rough idea of how it would look, but still need to work out details.Personally, I think this (like anything with the "noturgent" tag) is in the "not urgent but important" quadrant of the Eisenhower matrix, meaning we should come back to it before release but not before getting working TIS. The current implementation will work fine for TIS, TPS, and AMS (and probably for FFS, steered TPS, and anything else we’re putting into 1.0).
We (I) should add some tests to check the open simulator object. Right now I just run the alanine_example which fails if the simulation takes too long.
Conda now defaults to MSMB3, which does not have the metrics
package, breaking Travis builds because orderparameter.py
calls for msmbuilder.metrics
.
This was given a temporary fix in #93 by fixing msmbuilder version to an old version (the same one I got through pip a while ago). This is not the best solution, so something better should be done.
Assigning to @jhprinz, since the MSM side of things is his expertise, and he can more easily correct the existing code.
The first thing I'm trying to do is to convert my XYZ files into your NetCDF format and vice versa. This will be an quick and dirty hack, since it isn't intended to be used very much. But I need to know how to do the following:
Cool, idea. Before I get into this, we could also write at some point a different simulator that will just generate trajectories in your example potentials. This does not have to
use OpenMM at all, but I guess that would be nice.
What you have to do is to generate a new netCDF file with a topology file that you have to create. This will be stored in a serialized form in the netCDF so that it is always available
if you continue your runs. When you create the netCDF it takes the atom number from the topology and creates a dimension called 'atoms' and sets it to the number. After that
the tables that holds the atom coordinates is fixed in this size and cannot be changed anymore (I think). The number of frames is set to 0 which means that it can grow in this
direction.
At the topology level:
- get and set number of atoms
- get and set atom name labels
Hmm, this is actually the most tricky part, since I am using an mdtraj.Topology object for this that contains all these information. For you you would have to construct
this for your system, which should be easy, if you have done this before. If you send be a rough outline of the structure I can assist you with this.
At the snapshot level:
- get and set snapshot velocities
- get and set snapshot positions
I recently changed the structure on this so we should make sure that you have the newest version. The main point was, that I did not have a way to store reversed frames.
I wanted to keep this information in a boolean which is effectively attached to the trajectory and not to a snapshot. So what I came up with so far is:
A 'Trajectory' is a list of 'Snapshot'
A 'Snapshots' consists of
A 'Configuration' contains
A 'Momentum' contains
And in the database there are several tables now that are used to reconstruct these objects
configuration_coordinates
configuration_potential_energy
configuration_box_vectors
momentum_velocities
momentum_kinetic_energy
trajectory_configuration_idx
trajectory_momentum_idx
trajectory_momentum_reversed
trajectory_path_hamiltonian # not used so far
trajectory_length
Unfortunately so far I am using a maximal trajectory length, but this can be easily changed to allow for infinite length trajectories, although storage is not effective then.
Alright, this makes it a little more complicated, but should only be under the hood and not make a problem for the user. He can just work with trajectories and snapshots.
But now we can also store easily snapshots without their momentum without problems. Or store the same coordinates with several velocities, which could be useful.
Also, reversibility is treated naturally and we just store the same momentum index and set the frame to be reversed in the snapshot.
Having said that how do we get the positions into the database. First make an empty trajectory
traj = Trajectory([])
You need to construct snapshots using
new_frame = Snapshot(coordinates = your_numpy_array_of_nx3_coordinates, box_vectors = your_box_vectors, kinetic_energy = whatever)
then save it using
new_frame.save()
or, if you have a problem because I did not correctly implement the saving
new_frame.configuration.save()
which stores only the coordinates, since the momentum is empty (I need to check if this really works).
Finally append this to your trajectory using
traj.append(new_frame)
and loop. Last save the trajectory so that we know the order of frames ;)
traj.save()
At the trajectory level:
- get and add snapshot in the trajectory
you retrieve a trajectory from the database using
traj = Trajectory.get(index)
get snapshot just like using a list. You can also use + to combine lists and use slicing as well as use a list itself to get only specific frames, etc…
my_snapshot = traj[index]
e.g.
new_trajectory = traj[0:10]
new_sliced_trajectory = traj[ [0,10,20,30,40] ]
snapshot = traj[2]
replacing and appending also works as usual
traj.append(my_snapshot)
snapshots can be reversed by
snapshot.reverse()
which just flags the snapshot that its momenta should be reversed if retrieved.
You get back the coordinates and/or velocities using
my_snapshot.coordinates # this is short for my_snapshot.configuration.coordinates
my_snapshot.velocities # NOT short for my_snapshot.momentum.velocities because my_snapshot.momentum.velocities returns the momentum in the attached snapshot while my_snapshot.velocities get back reversed ones depending on the reversed flag in snapshot !
provided of course that the snapshot has momenta or coordinates.Internally a missing configuration or momentum is marked by index zero (0). Interestingly even, if the momentum is missing it can still be marked as reversed. Might come in handy for later
analysis
At the trajectory-set (storage?) level (i.e., set of saved trajectories):
- get and add a trajectory in the trajectory-set
as I showed before. Make a trajectory by whatever means and then call save on it and it gets added, like
traj = Trajectory([ snapshot_1, snapshot_2 ])
traj.save()
or
new_traj = traj[0:100:2]
new_traj.save()
If you put a number in the save command (works also for Configuration and Momentum) you can replace an existing one.
So far I have not tested what happens if the index does not exist. I believe it just gets created by the netCDF library.
traj.save(10)
At the file level:
- read in and write out your data objects to a file.
I assume you want to write these as trajectory files? For this I am using mdtraj although I have not updated to version 1.0.0 yet.
http://mdtraj.org/1.0.0/api/trajectory_files.html
which allows to write most file formats. Unfortunately not .xyz format. You probably know best on how to do that. I have some functions to retrieve just coordinates as a numpy array which might help.
use
storage.trajectory_coordinates_as_array(trajectory_index, list_of_atom_coordinates (or None to get all))
here storage is a link to the global storage object. In the sandbox.py example this is given in
simulator.storage
to use mdtraj to store a trajectory, e.g. as a multi-frame pdb which can be read (very slowly) in VMD use the md() command on a trajectory to get an mdtraj trajectory object that can be stored. E.g.
my_traj.md().save_pdb('mdtraj.pdb', True)
The true at the end allows to replace existing files
One nice thing that a trajectory object can also do is to use subsets for specific atom_coordinates. So if you want to store only specific atoms you could use
small_traj = my_traj.subset([0,1,2,3,4,5,6])
which will generate a new trajectory object that remembers that certain output should only give atoms 0 through 6. This is still a little experimental after the most recent update.
I haven’t really tested it. Since we might mostly want only the solute, I have a shortcut for the subset with solute coordinates (if they are specified). You can just do
solute_traj = my_traj.solute
Internally these subset trajectories are only for post-processing. They never get saved and internally there is always the full coordinates saved. So finally saving the solute trajectory is done by
my_traj.solute.md().save_pdb('solute_traj.pdb', True)
We could add a XYZ saver, that could save some time.
As discussed in #37 (comment) and #69 (comment), we should implement a SlicedTrajectoryEnsemble
to return most of the functionality we previously had from the frames
attribute of VolumeEnsemble
s. The idea is to use the fact that trajectories are lists, and this subclass of AlteredTrajectoryEnsemble
modifies the trajectory to take a slice of frames, defined by a Python slice
object. The old approach didn’t play well with SequentialEnsembles
; this one should.
One question: what's the best way to handle single frames? I see 3 reasonable options.
frames
argument to __init__
, but internally convert that to a sliceSingleFrameEnsemble
)SlicedTrajectoryEnsemble
methods for single frames and slicesThat's roughly my order of preference, but I'd appreciate second opinions. My idea for the first approach is that the __init__
function (with argument frames
) would have something like:
if type(frames) == int:
self.frames = slice(frames, frames+1)
That gives the extra flexibility with the smallest amount of additional code to maintain.
I had this idea to extend the framework a little bit to also allow "us" to make regular (adaptive) MSM simulations. These usually work, by starting somewhere to run a lot of short trajectories. Then do analysis and then pick some samples to start more simulations from these points.
This is very similar to what we do, we only
We could even do this with several replicas easily. The storage is not super effective, but with all the order parameter logic etc might provide an easy way to analyze a system.
I think I will start implementing the necessary steps and see what happens.
(Parliamentary inquiry in the sense of a question about the rules of order -- not an investigation of wrongdoing.)
Where do we want to put scientific discussion?
A lot of what we'll be working on soon (e.g., how to define interfaces/order parameters based on cluster centers) are things where I think we'll try a few reasonable ideas and see what works. Mainly, I'm thinking about the kinds of results/discussions that I will normally share with Peter at the Monday meeting of our path sampling subgroup. What’s the best way to keep those of you at MSKCC in the loop as well?
As I see it, we have at least three choices (or more if we use tools other than this GitHub repo):
Maybe some combination of these (writing a TeX file and calling people's attention to it with a new issue?) would work best. Preferences? Suggestions?
(Note: I wrote all of this a couple weeks ago, and kept forgetting to post it. JHP, please take a look, and let me know what you think of the move_path
attribute, whether it'll cause problems for storage: I think there's a problem there we need to solve, but I'm open to other solutions.)
PathMover
s, included several classes which I’ve already mostly implementedMoveDetails
object, in particular replacing the mover
attribute with a move_path
attribute, which is a list of 2-tuples (mover, accepted)
Sample.details.accepted
can be derived from Sample.details.move_path
and how this allows much more analysis of the behavior of our PathMover
s (a key tool for practical troubleshooting)PathMover
s can be implemented using these combination toolsThe main thing I need is advice from @jhprinz on whether this Sample.details.move_path
will be a pain to add to storage.
If you see any holes in this implementation, let me know, but I think it will allow us to quickly add new move schemes in the future. It’s currently being implemented in the branch for PR #106.
As I’ve been putting together the framework for PathMover
s in general, I’ve been thinking about the fact that many of our more interesting moves in path space can be described as some combination of more elementary moves. This is much the same as how our Ensemble
s are made of particular combinations of other the elementary ensembles.
So the questions I’ve been working on answering are:
For the first two, we’re fine if the answers are only partially complete. In fact, I’m not planning to implement some parts of this. But we do want to correctly anticipate best way to handle the last one.
Before diving into the specifics, let me introduce the general idea of the move tree. The basic idea is that we typically have a decision tree to select which PathMover
will generate Sample
s for this step in the Monte Carlo. For example, perhaps you first choose a replica exchange out of the list of shooting, replica exchange, minus move, or path reversal. Then you have to choose a specific replica exchange pair to use.
In order to make this sort of process as flexible as possible, we use the idea of the move tree. In general (at least with PathSampling
approaches) we’ll have a root_mover
which leads to the whole tree.
The point is that we need to, in some way, record the move tree path that led to generating a specific Sample
. In fact, I’d argue that the Sample
should be labelled as generated by the move tree path, not just by a given mover (in fact, JHP already added something to this effect by having MixedMover
modify the sample.details
-- what I propose here is to generalize and complete that).
Some moves also consist of several sequential submoves. That’s also handled in this discussion.
I don’t claim that this list is exhaustive, but here are a few of the moves I have identified as elementary. We may add others as we go along:
ForwardShooting
and BackwardShooting
TwoWayShooting
(not to be implemented in version 1.0)ReplicaExchange
EnsembleHop
I think that a number of other moves (e.g., StateSwap
) are just special cases of ReplicaExchange
or EnsembleHop
. A few sections below, I’ll show that OneWayShooting
and MinusMove
can be described using these elementary moves and the combination tools I describe in the next section.
It’s possible that 2-way shooting can be implemented as a combination move. I’d prefer that, although I haven’t thought out all the details of what would be needed to make that happen.
MultiMover
s: Combining other movesWe need various ways to combine these moves to give the overall behavior we want. I’ve identified 5 such combinations; there may be others to be made. I describe each as a move consisting of submoves.
MixedMover
: Selects a submove at random (optionally with weighted selection probabilities). Does only that move.SequentialMover
: Does its submoves in order, attempting all submoves and applying them to the global state sequentially. For example, this could be a move scheme that involved doing all repex in increasing order of replica, followed by shooting on all replicas, followed by all repex in decreasing order of replica.PartialAcceptanceSequentialMover
: Does its submoves in order until/unless one is rejected. All moves up to that point are applied to the global state. For example, a bootstrapping move first does a Shooting
; if that is accepted, then it attempts an EnsembleHop
. But even if the EnsembleHop
is rejected, we want to apply the success of the Shooting
.ConditionalSequentialMover
: Does its submoves in order until/unless one is rejected, but if one is rejected, previous steps are not applied to the global state. For example, a MinusMove
involves first attempting a replica exchange, and then doing an extension (specialized shooting) move. If the replica exchange fails, then we never start the expensive extension process. If the extension fails, then the whole move is rejected (including the successful replica exchange).SimultaneousMover
(not implemented in version 1.0): This is similar to the SequentialMover
, but instead of updating the global state after each move, all samples are generated starting from the same global state. I think this is only a good idea if there’s no overlap in the replicas affected by the submoves (for example, when doing a shooting move on all replicas). In principle, this could be parallelized. For now, such cases give equivalent results if run as a SequentialMover
.As discussed in #104, we could either make moves like these as pseudoclasses (i.e., offer convenience functions that construct them, but don’t actually have a class named, for example, OneWayShooting
) or we can make actual classes which are mostly subclasses with different __init__
functions. I prefer the latter, because I think it will make analysis easier, and it may make fancier moves possible as well. Advanced users can always use either approach to create their own moves.
Subclass of MixedMover
, consisting of ForwardShootingMover
and BackwardShootingMover
, both using the same ShootingPointSelector
. This is, indeed, how we’ve already implemented this.
Subclass of MixedMover
, consisting of ForwardMinusMover
and BackwardMinusMover
. Each of those are, in turn, subclasses of ConditionalSequentialMover
. The submoves of those ConditionalSequentialMover
s are a replica exchange move followed by a shooting move (which selects the endpoint as the shooting point).
MoveDetails
One part that gets tricky is that we now have to manage all the details of this information in the MoveDetails object. My suggestion is to replace the MoveDetails.mover
object with a list, MoveDetails.mover_path
, which consists of 2-tuples of (mover_ID, accepted)
for each of the moves in the move tree path that led to this sample. MoveDetails.accepted
is true if and only if all moves in the tree path were accepted.
This allows us to easily analyze acceptance ratios (which give us the first sign of problems) from many different angles: by starting with a move that includes all shooting moves, we can get the overall shooting accepted. Then we can drill down to see the acceptance for each individual replica. Then we can drill even further down to see how forward/backward shooting acceptance ratios differ for each replica.
To make this work, we need to think about the acceptance given in the mover_path
2-tuple for each multimover. For all movers except ConditionalSequentialMover
, this is always True
: acceptance of the sample depends only on acceptance of the submove. The previous paragraph is done by taking all Samples
containing a given mover
in their sample.details.mover_path
s, and looking at sample.details.accepted
.
But consider a MinusMove
where the replica exchange part is accepted, but the extension is not. This will generate two samples: one from the replica exchange, and one from the extension. The mover_path
entry from the replica exchange sample will say that the replica exchange was accepted, whereas the entry from the extension part will say it was accepted. The key is that in BOTH cases, the mover_path
entry associated with the ConditionalSequentialEnsemble
will be rejected. In this case, we can analyze the acceptance ratio of the replica exchange part by looking at its contributions to the mover_path
list, which will give different information from the overall acceptance given by sample.details.accepted
, which would be false for both of these.
The only downside of this approach is that it does require some modification to storage. Basically, we need to be able to store these move_path
lists. @jhprinz, would that be hard to implement?
Like in the toy_dyanmics example of David. He uses a function on the simulator to get a snapshot object.
So the simulator is responsible to create the snapshot not the snapshot using a simulation itself!
Or we add function to the simulator that get called by the Snapshot.init to get the necessary stuff from the simulator. Although I like the first way better. Snapshots itself can only be build using coordinates, energies, etc but not using a simulator/context object.
You can use the mdtraj template for this:
https://github.com/SimTk/mdtraj
https://github.com/SimTk/mdtraj/blob/master/.travis.yml
I think it would be good to get somehow David and Peter access to the GPU cluster if there is a way to do so. Mainly for faster testing right now than for production. It would also be a good test for a potential conda package.
Also, there is a way to use AMD graphics cards, but not on a Mac. Linux should work though!
in the Travis-CI setup we actually already create a conda package and it would be great to make this publicly available (after a successful build). I think we are almost at a stage where we could do that. That would also come in handy when we want to test the framework on a cluster.
I looked at the way mdtraj does this and (as John mentioned) we need encrypted key to do so. My question @jchodera would be what account to use on binstar for this. I have one but have not access to user omnia so I cannot upload to binstar for this account and I am not sure if my account can host it or if we want my account to do so.
Ultimately it should be part of omnia. What is the best way to do so and is there a way to keep this private until we are confident to really make this publicly available?
Find samples that have been generated by specific path of PathMovers.
LambdaVolume
as it stands doesn't work for volumes which wrap around periodic boundaries (e.g., defining the \beta state of alanine dipeptide in terms of the \psi angles). This can be hacked as the union of two LambdaVolume
s, but I think a separate subclass class will be cleaner for the user.
I'll do this (already assigned to self); just making the issue to remind myself to come back to it.
As I’m working on some of the PathMover
objects, I realize that there are a lot of cases where moves can be considered combinations of other moves. The question I have is whether we want to create them as proper classes in their own right, or whether we want them to be generated using convenience methods.
For example, a OneWayShootingMover
is really a MixedMover
that randomly picks to either do a BackwardShootingMover
or a ForwardShootingMover
. Do we want the returned type of OneWayShootingMover
to be MixedMover
or OneWayShootingMover
? With ensembles, we’ve chosen to not treat them as classes (for example, a TISEnsemble
is just a shortcut for the user to create a special kind of SequentialEnsemble
; the code only sees it as a SequentialEnsemble
.
If moves are not separate classes:
.name
attribute, which normally defaults to class name but could be something else to make analysis easier.If moves are separate classes:
__init__()
function (assuming the .move()
is just a combination type we've defined of the moves we’ve defined). If that isn’t enough, it’s easy for the contributor to see how to make much more complicated things by overriding .move()
I think the best thing to do is probably to treat these separate classes. Note that both approaches will be available to users: they can always define one-off moves without creating subclasse, or (of course) they can always create subclasses. The question is which approach we use, and therefore recommend by example.
Do you want to rename this GitHub repo from opentis
to openpathsampling
?
Peter @bolhuis suggested that it might be useful to have a splitting feature for the file storage. I thought this should be easy and kind of failed to do quickly implement it. So now I have a draft version that allows us to call a "split" function that then creates a new file and continues storing in the second file, which is itself a valid netCDF4 Storage files. Internally it looks exactly as a normal Storage so that no code outside of the Storage is touched.
To do so, I needed to subclass the netcdf.Variable
and especially netcdf.Dimension
to controle the splitting of a dimension. This means if a file is split the class knows how many items are in each file and if you request no 500 then it find the correct file adjusts the position and reads the object.
This all works so far, although I need more testing. I think this is especially useful to create milestones in simulations and have files, that are completely separated. And can already be moved and/or analyzed.
Also all of this is independend of the previous implementations. There is just a new subclass of storage that is different. So, we can completely ignore the splitting of we want to.
There are a few questions I have though:
Lets assume that each file can store 100 configurations and if I request number 150 then it will load no50 from file no. 2. Simple. The problem is that all objects can reference each other, which means, that in file 2 might be a trajectory that points to configuration 10, which is in file 1 and even if a trajectory in file two points to object 150, this object is found in the same file at position 50 not 150. This means, that although the file looks intact like a storage, without the other files it is partially meaningless. And I would like to avoid this.
I think, it would be better to have parts that are completely intact (if this makes sense). This will require two additional features:
One second problem. Right now I store order parameters per configuration. This means that an order parameter has the same table length as there are numbers of configurations in the file. This means for splitting that if we create a new order parameter (or save it!) after we have started splitting, we need to add the orderparameter to all files. Also, if we add cached data to the order parameter and save again. It will potentially need to access all storage files, which kind of breaks the nice property of having only one active file and the rest is immutable.
I would prefer this immutable idea so that we can really take first N-1 split files and store or analyze them already. The main problem stems from the fact that an orderparamter object is not immutable and its cache will change over time. There are several ways to fix this:
I would prefer no. 1 and kind of make the order parameter immutable in the file. It could still happen that we save a configuration again for completeness and in the meantime have more order parameters. Then these of course would be also stored. This way we keep each file consistent and can maybe only use the last file for a restart.
We would like to be able to split the storage files fow various reasons. I would like this property to be independent of our old concept and allow to use this, if necessary and have each file to be
All of the ideas are of course consistent with our current single file approach and require almost no changes.
The additional files and numbering will increase the read overhead (writing should be fine). Especially if we have to find first, in which files an object is present. The completeness overhead should make this not too bad since related data can always be found in a single file.
Stupid question, but
filename.[###].nc
or
filename.nc.[###]
or do we want our own fileending?
filename.[###].ops
To allow to create several engine objects I would like to move the storage creation out of the engines and let this be handled by the user. So the storage has all the information about the system, n_atoms, topology, an initial configuration which is what an engine might need
storage = Storage('file.nc', mode='w/a', topology=..., n_atoms=..., configuration=...)
engine = OpenMM(storage=storage, opts) # creates new engine
# save the options in an engine used to generate it
storage.engines.save(engine)
# create an engine using the stored options and information in the store
old_engine = storage.engines.load(0, OpenMM)
This would save exactly what we save right now, just in a more separate way.
Next, we would still have to pass the engine to the PathMover class. Either by setting the global using
PathMover.simulator = engine
or per mover
mover = ForwardMover(..., engine=engine)
Suggestions?
Per discussion in #65 (comment) ff., Toy Dynamics should have a separate object (perhaps called ToySystem
or ToySystemState
), such that for simulation = ToySimulation()
, instead of
vel = simulation.velocities
pos = simulation.positions
we have
vel = simulation.current_system.velocities
pos = simulation.current_system.positions
This makes Toy Dynamics more pedagogically useful
A new issue to discuss name changes:
class Snapshot()
[the object holding a configuration, a momentum and a reversed flag]
=> keep
=> Frame()
Snapshot.configuration
[the pointer to the configuration]
=> keep
=> Snapshot.coordinates
=> Snapshot.configuration
=> Snapshot.potential (better not, although we could have potential.energy, kinetic.energy)
Snapshot.momentum
[the pointer to the velocities]
=> Snapshot.velocity
=> Snapshot.velocities
=> Snapshot.kinetic
class Configuration()
[the object holding coordinates, box_vectors, potential_energies per frame]
=> keep
=> class Coordinates()
Configuration.coordinates
[table of atom positions]
=> keep
=> Configuration.xyz
class Momentum()
[the object holding velocities and kinetic energies per frame]
=> class Velocity()
=> class Velocities()
=> class Kinetics()
Momentum.velocities
[the actual velocity table]
=> Velocity.coordinates
=> Velocity.xyz
=> Velocity.velocity_xyz
coordinates_xyz = snapshot1.coordinates.xyz
velocities_xyz = snapshot1.velocities.xyz
snapshot1.coordinates.potential_energy
snapshot1.velocities.kinetic_energy
snapshot1.coordinates.box_vectors
Ensemble.forward(trajectory)
[the function that checks if a trajectory can still be in an ensemble if extended by one frame at the end ] => Ensemble.continue_foward(trajectory)
Ensemble.backward(trajectory)
[the function that checks if a trajectory can still be in an ensemble if extended by one frame at the beginning ]
=> Ensemble.continue_backward(trajectory)
What license do you guys want to use?
How about the same LGPL that OpenMM is released under? That's reasonably permissive and consistent with other projects in the lab.
@dwhswenson and me discussed that it would be nice to use different unit system in the storage which is especially useful when it comes to using ToyModels.
We would like to set units to be used for storage and then, if we store a Quantity()
in the wrong units they get converted automatically.
I found that simtk has such a feature already which I would use, but am not sure how to treat non-physical unit systems where we do not use units. Maybe we just should not do that. Since I have almost implemented that feature I propose that:
storage.init_variable('configuration_coordinates', 'float', units = simtk.unit.meters)
and since in the standard md_unit_system length is in nanometers this would effectively call
storage.init_variable('configuration_coordinates', 'float', units = 'nanometers')
or
storage.init_variable('configuration_coordinates', 'float', units = 'nm')
When we now store data of dimension length
we can automatically convert to the unit in the storage and save. E.g. when we use a storage with ångström the values from openmm (in nanometers) get multiplied by 10.
We have two objects, Simulator
and Simulation
, which have very similar names and very different purposes. I think that will be terribly confusing for users. In addition, we have a Calculation
class whose purpose might also be confused with the other two.
Here is my description of what each object does (@jhprinz, let me know if I'm missing any aspects):
Simulation
: interacts directly with the underlying MD engine. It just transfers state information (positions, velocities, etc.) back and forth between MD engine and our code.Simulator
: this one is a bit harder to describe. It mainly involves two functions: (1) setting up the system (including setting up our storage and setting up topologies, force fields, etc. for the MD engine); (2) the generate
function, which runs the MD and stops when we reach a stopping condition.Calculation
: think of this as the main()
for whatever kind of sampling we're doing: it's the part that distinguishes the overall sampling algorithm, e.g., making TIS different from SATIS or AMS.Originally the Simulator
/Simulation
divide was because we needed our own class (Simulator
) to interact with OpenMM's Simulation
. I think we can clean that up now; whether we also need to change the Calculation
name is up to you.
Here's my suggestion for refactoring:
Let specific Simulation
s (e.g., OpenMMSimulation
) inherit from Simulator
(which might be renamed to Simulation
). For cases like OpenMMSimulation
, which currently inherits from openmm.app.Simulation
, we could use multiple inheritance, or we could make the openmm.app.Simulation
into an instance variable for the OpenMMSimulation
. So there would be a superclass with the code to set up the storage
and for the generic generate
function, and the subclasses would support system set-up and dynamics state information for OpenMM, or for my ToyDynamics, or (perhaps one day) for Gromacs.
From a user perspective, this would mean that instead of having to build both a Simulator
and a Simulation
, you just need a single object which combines information from both. In fact, the final object is kind of an OpenTIS wrapper around the MD engine's objects, providing a few extra functions and the ability to translate to and from our data structures.
If this seems like a good idea, I can implement it. Any thoughts?
Two parts:
Part 1. Meaning/usefulness of Ensemble.forward()
(and .backward()
)
I’m a little confused as to what Ensemble.forward(traj) is supposed to do. The docs say that it is supposed to let you know whether you might still be in the ensemble if you add another frame. In other words, it is the logical not
of a stopping condition. If that’s the goal, shouldn’t it basically boil down to:
self.volume(frame)
).I'm not quite clear on what the code in HitX and LeaveX is doing, but it is significantly more complicated than returning True.
Part 2. If I’ve understood that correctly (or even close to correctly), there’s a logic problem in NegatedEnsemble
. Using notation of x for a trajectory, x[t] for a frame, and vol for a given volume, I’ll explain:
Let’s talk about NegatedEnsemble(LeaveXEnsemble(vol))
. Since LeaveXEnsemble(vol)
is
{ x | exists x[t] where x[t] not in vol },
NegatedEnsemble(LeaveXEnsemble(vol))
means
not { x | exists x[t] where x[t] not in vol } == { x | not exists x[t] where x[t] not in vol }
and that, in turn, is just the contrapositive of
{ x | for all x[t]; x[t] in vol }
which is InXEnsemble(vol)
. So NegatedEnsemble(LeaveXEnsemble(vol)) == InXEnsemble(vol)
. This makes sense to me: If you never have a frame outside the volume, then all your frames are inside the volume.
Now consider a trajectory traj
which consists of one frame in a volume vol
. What is the expected behavior of .forward(traj)
for various ensembles? At this point, LeaveXEnsemble(vol).forward(traj) == True
, right? If a trajectory goes forward, it is possible for it to eventually leave the volume vol
. Also, InXEnsemble(vol).forward(traj) == True
. It's now in the volume, and it might stay.
However, in its current implementation, NegatedEnsemble(LeaveXEnsemble(vol)).forward(traj) == False
, because NegatedEnsemble(ens).forward
returns not ens.forward
. Flipping the forward()
in NegatedEnsemble
isn’t logically sound.
I've created an S3 bucket called openpathsampling
that we will have openpathsampling.org
point to, much like http://mdtraj.org does now.
@pgrinaway can help configure this.
You should also feel free to make use of the test systems in openmmtools.testsystems
to set up some tests and examples for users.
We can also add examples you find useful to that file to make it easier to import and share them.
What would you guys think about eventually targeting msm-tis
(or whatever you'd like to call this in the end) as part of the conda-installable Omnia
collective?
The goal here is to collect a number of OpenMM-based packages into conda-packaged tools that share some common structures and file formats, with the main goal being ease of installation/use and interoperability.
We're trying to standardize some aspects of this, such as how documentation is made available, how travis tests are run, packaging standards, etc.
@kyleabeauchamp can provie more information about how this works with other projects like mdtraj.
I think the current names of the MoveDetails
attributes are a bit confusing. Before we get too far into making more PathMover
s, I'd like to change some of those names. In particular, I think final
vs result
is a little unclear, and success
vs accepted
vs acceptance
is unclear.
Here are my suggestions, in the format oldname => newname
: description
Things to change:
final => trial
: a Trajectory
representing the trial trajectory (what will go to the output sample if the move is accepted)acceptance => acceptance_probability
: the probability of accepting this step (i.e., Metropolis acceptance probability). Note that for many moves, this is 1 or 0.accepted => trial_is_in_ensemble
: this seems to only be the test of whether the trial path in a shooting move is in the correct ensemble. Unless I'm misunderstanding, it should be moved out of general move details, and only exist for shooting movers.success => accepted
: whether the trial is accepted by the Monte Carlo methodThings to keep the same:
inputs => inputs
: list of the Sample
s selected to be used in the moveresult => result
: the Sample
that gets returned by the movemover => mover
: the PathMover
which generated these detailsNote that right now, the things I say are Sample
s are actually Trajectory
objects. For now, I'm leaving them as Trajectory
, but I think we might want to change that to Sample
eventually. (Should result
be a Sample
which is recursively referential, or should it be the Trajectory
of that sample? Gets weird when thinking about EnsembleHopMover
s, where the Trajectory
doesn't change.)
Any other thoughts on these names? Suggestions for better names to use?
I’d like to start fixing up the GlobalStateMover and related classes. I picture something that (based on our current naming, which I consider open to changes) looks like this:
PathMover
: generates individual Sample
objects; these are the "atoms" of an overall move: shooting move (on a single ensemble), replica exchange move (on a single pair), etc.GlobalStateMover
: this allows us to combine several of the "atom" moves at once. For example, we may want to do shooting moves on all ensembles at once. This groups the PathMover
moves together, and statistical sampling is done after each of these.GeneralPathMover
: this may actually be a factory function, but the point is that it should take nested lists of path movers and create an appropriate GlobalStateMover object to handle all moves in the simulation. It's basically a convenience function to create the kind of GlobalStateMover
we often use in TIS.A more detailed explanation (based on how these will be used) follows; please let me know what you think of this plan soon, since I’ll start implementing it as soon as I can! Also, as far as I’m concerned, all names are open for changing.
I should also clarify that, at this point, we’ve implemented a few things using PathMover
s, but we haven’t set up GlobalStateMover
s (beyond a skeleton) or GeneralPathMover
s.
PathMover
shooter = ShootMover(UniformSelector(), ensemble)
repexer = ReplicaExchangeMover(ensemble1, ensemble2)
mixedmover = MixedMover([shooter, repexer])
shooter
does is the shooting move for a given ensemble. repexer
is the replica exchange mover swapping the two given ensembles. The MixedMover
has a 50% change of doing replica exchange, otherwise it does the shooting. The probability can be changed by including another parameter, probability
when setting up the MixedMover.
None of these increment the step on the Monte Carlo sampling: that is, they contain no information about when you should take samples from your total distribution for statistics purposes. This way, you can combine PathMover
s in a way what the subsets of the ordering doesn’t satisfy detailed balance, but the whole sequence (defined in GlobalStateMover
) still does.
There will also be factory convenience methods for common groups of replica exchange movers:
PathMoverFactory.NearestNeighborReplicaExchange(ensemble_list)
PathMoverFactory.AllReplicaExchange(ensemble_list)
NearestNeighbor
is frequently used (sometimes with a few extra exchanges included); I've only ever used AllReplica
when testing out some new ideas.
GlobalStateMover
all_shooting = GlobalStateMover([shooter1, shooter2, shooter3])
all_shooting.move()
will simultaneously do shooting moves of all 3 types, and will increment the Monte Carlo step, saying that you should sample after each of these moves.
GeneralPathMover
The GeneralPathMover
is a way of lumping all of this together to simplify setup. You general one huge list of movers: if all the objects in the list are of the PathMover
type, then it makes a GlobalStateMover
. Otherwise, it makes all of the objects in the list into GlobalStateMover
s (you can make a GlobalStateMover
with a list of one PathMover
).
all_moves = GeneralPathMover([[shooter1, shooter2], [repex1], [repex2])
Each list of PathMover
s is a GlobalMover
. Each list of GlobalMover
s is a GlobalMixedMover
. GlobalMixedMover
s can be nested. Here we again have to deal with probabilities. My suggestion is that the default be that each PathMover
have equal probability, and to take the option probability='equal'
to make all GlobalStateMover
s within each GlobalMixedMover
have an equal probability.
Putting all of this together, the idea is that you'd be able to eventually set up a TIS calculation for a given ensemble_set
with something like this:
# assume we've already set up ensemble_set, a list of ensembles
from opentis.pathmovers import PathMoverFactory as mf
shooters = mf.OneWayShootingSet(UniformSelector(), ensemble_set)
repexers = mf.NearestNeighborReplicaExchange(ensemble_set, as_global=True)
reversal = mf.PathReversalSet(ensemble_set)
minus = MinusMover(ensemble_set[0], stateA)
general = mf.GeneralPathMover(movers=[shooters, repexers, reversal, minus])
tis = PathSampling(storage=storage, engine=engine, movers=movers)
To get the best efficiency, you'd probably have to also include a probability
argument for the GeneralPathMover
. A few of the exact calls here might change, but the idea is that it shouldn't really get any more complicated than that.
After all this the conda package is still missing. I think this is simply part of the
after_success.sh script so upload the created conda package to binstar. Would that be good for
Peter for example?
I think we don't really need this right now, do we?
This is a heck of a long issue (more of a manifesto, really), but I've been working a lot on ensemble.py
and I'm suggesting a few changes which involve imposing a little more structure on the ensembles we've defined, but which will allow us to do just as much (or more) with a lot less code to
manage.
The big picture of what I'm proposing is to move our more complicated ensembles from a single logic-combined ensemble into an ensemble which consists of a time-ordered sequence of ensembles (SequentialEnsemble
).
We'll probably need the SequentialEnsemble
for the minus interface anyway, so I've already implemented it (currently testing it and fixing cases as I identify them).
(Also, in this issue I use the notation can_append
and can_prepend
: I'm still open to changing those names, but I need to call them something to discuss them!)
Let's start with a standard TIS path ensemble. Assuming we've defined volumes for stateA, stateB, and interfaceA0, our current notation describes this interface as
( InXEnsemble(stateA, frames=0) &
OutXEnsemble(stateA | stateB, frames=slice(1,-1)) & LeaveXEnsemble(interfaceA0) &
InXEnsemble(stateA | stateB, frames=-1)
)
I think this is pretty clear, and corresponds directly to our mathematical notation $h_A(x_0)\prod_{J\in{A,B}} \bar{h}J(x_1,..., x{L-1}) \hat{h}{A,0}(x) \sum{J\in{A,B}} h_J(x_L)$.1
On the other hand, we can also base this whole thing on the SequentialEnsemble
. In that case, in addition to stateA
, stateB
, and interfaceA0
, we'll need an ensemble length1
for path segments with 1 frame (That's just JHP's LengthEnsemble(1)
). Then our TIS path ensemble would be defined by:
SequentialEnsemble( [
InXEnsemble(stateA) & length1,
OutXEnsemble(stateA | stateB) & LeaveXEnsemble(interfaceA0),
InXEnsemble(stateA | stateB) & length1
] )
The path starts in stateA
(for one frame), then is out of stateA | stateB
, eventually crosses interfaceA0
, and ends with exactly one frame in stateA | stateB
. I think that's pretty easily read from the code, and tends to be the way we actually talk about interfaces.
When trying to define the minus interface, this gets a bit more confusing. For the sake of discussion, I'll define the minus interface as including the segment that is being extended, so a path in the minus interface consists of two segments that satisfy the innermost interface ensemble, with a segment in between which does not cross the interface. If we do not require that interfaceA0
be equivalent to stateA
, I haven't come up with a way to do this with just logical ensemble combinations2, and it gets harder if you allow more excursions for better decorrelation, as Peter and I did in our work on the "geisslerene" fluid. However, with a SequentialEnsemble
, it isn't too bad:
SequentialEnsemble( [
InXEnsemble(stateA) & length1,
OutXEnsemble(stateA) & LeaveXEnsemble(interfaceA0),
InXEnsemble(stateA) & length1,
InXEnsemble(interfaceA0) | length0,
OutXEnsemble(interfaceA0),
InXEnsemble(stateA) & length1
] )
In that, I've also defined a length0
ensemble for trajectories of length zero; using ensemble | length0
is a way to make frames matching ensemble
be optional; in the case of the minus interface it's there just in case interfaceA0
is very close to (or equal to) stateA
and we have a path that enters stateA
for one frame before doing the second exit for the minus interface.
Thanks to python's delicious syntactic sugar for lists, doing the multiple excursion version almost as simple:
SequentialEnsemble(
[ InXEnsemble(stateA) & length1,
OutXEnsemble(stateA) & LeaveEnsemble(interfaceA0)
InXEnsemble(stateA) & length1
] + [
InXEnsemble(interfaceA0) | length0,
OutXEnsemble(interfaceA0),
InXEnsemble(stateA) & length1
] * N_l
)
where N_l
is as we defined it in our paper.
We can think of ensembles as falling into categories depending on what trajectory information they interact with. We already have several categories of ensembles. The CombinationEnsemble
is the main ensemble for doing logical combinations. The LengthEnsemble
considers the total trajectory length. The various subclasses of VolumeEnsemble
consider the path of the trajectory through defined regions of some collective variable subspace.
However, in the current implementation, all of these can apply to specific frames as well. This means that we're mixing time and time-ordering roles with the other roles. My proposal is to put all the time-ordering roles into the SequentialEnsemble
, and therefore only leave the Volume
tests to the VolumeEnsemble
s. By making logical combinations of VolumeEnsembles
with SequentialEnsemble
s and LengthEnsembles
, I think we can do everything that the frames
attribute of VolumeEnsemble
s allowed previously (at least, I've managed to come up with ways to do crazy examples I tried to come up with: please try it some yourselves!)
The advantage of removing the frames
attribute is that it significantly simplifies can_append
and can_prepend
. These functions become one or two lines for any ensemble (except SequentialEnsemble
: that one was a bit tricky). By having clear and orthogonal roles for each category of ensemble, I think it is also easier to correctly create more complicated ensembles.
Furthermore, this removes the need for the AlteredTrajectoryEnsemble
s. At least, as those exist now, they're basically a hack to let you generate part of a trajectory, particularly in the case of one-way shooting.
SequentialEnsemble
is more complicated than in the other ensembles. However, I think we need it anyway for the minus interface.First, the can_append
and can_prepend
methods for the primitive volume ensembles get massively shrunk. Like, for HitXEnsemble
and LeaveXEnsemble
they're reduced to the line return True
. InX
and OutX
just call themselves on the first/last frame of the trajectory. There's considerably more fanciness in SequentialEnsemble
's implementation of these two functions, but we were going to need that anyway.
Second, the classes ForwardAppendedTrajectoryEnsemble
and BackwardAppendedTrajectoryEnsemble
were only used to help one-way shooting. They aren't needed for that. (There might be other tricks they are needed for, but I think most things can be replaced by SequentialEnsemble
. And we don't have to implement anything else for two-way shooting, either.
Third, other time-order-dependent VolumeEnsemble
subclasses, like EntersXEnsemble
and ExitsXEnsemble
, will also be obviated.
So framing things in this way removes the need for several classes that we've already implemented and a couple more that we were likely to need to implement for two-way shooting. It also brings a couple of dozen-line functions down to one- or two-liners. The cost is one class (with some admittedly difficult to understand methods) that we probably would have had to implement anyway.
There are a couple of warnings to be concerned about:
not
might not play well with this (see #32, and JHP's partial fix #34)(.*)c
matching everything ending in c
, it matches everything: the first pattern keeps matching until it fails. (This is probably fixable, and only requires a change in one function.)I'm running tests of SequentialEnsemble
: all functions work on all the test cases I've given in so far (out of a library of 50-ish toy trajectories; not all used on every test), but I've only tested ensembles based on InXEnsemble
and OutXEnsemble
. I still need to add support for the zero-length trajectory and I need to test with LeaveXEnsemble
and HitXEnsemble
.
Actually, there's also a bit in the code to say that the path must contain at least 3 frames: without that a path which went from A to B in one step would be accepted, leaving you no shooting frames. But if that happens, you're doing something very, very wrong). ↩
If stateA == interfaceA0
, then you can define a class that counts the entrances and exits of the ensemble; both must equal 2 (or N_l+1
in general). ↩
Two parts to this issue: a new feature, and a name change suggestion inspired by the new feature.
While @bolhuis was looking at the code the other day, he suggested that we create an OptionalEnsemble
class to handle optional parts of SequentialEnsembles
. This can be basically a wrapper for what we already do, which is to or
an ensemble with LengthEnsemble(0)
. Conceptually, this is a bit similar to a regular expression (condition)*
: zero or more frames must match this condition. The point is that what I’ve been doing so far, explicitly or
’ing the ensemble being made optional with a LengthEnsemble(0)
, is ugly and unreadable.
I’d like to rename AlteredTrajectoryEnsemble
to an AlteredEnsemble
. The way these ensembles work is to wrap an ensemble given in initialization (saved as self.ensemble
) and change the ensemble functions to work not on trajectory
but some function of trajectory, self._alter(trajectory)
by calling self.ensemble.function(self._alter(trajectory))
(where function
is __call__
, can_append
, can_prepend
). The default version (the superclass AlteredTrajectoryEnsemble
) makes no change to the trajectory, and so should give the same results as the original ensemble.
I realized that the same machinery could be re-used by not only having subclasses that change the trajectories, but other subclasses that change the underlying ensemble. In these, instead of taking the given ensemble and putting it directly into self.ensemble
, we save a modified ensemble as self.ensemble
. Using the default (unchanged) self._alter
means that the AlteredEnsemble
would call a modified ensemble on the unmodified trajectory.
In general, I have qualms about mixing two types of effects (alter how ensemble sees trajectory, alter ensemble itself) in one class, and I’d definitely recommend against mixing the two in any one subclass (unless there wasn’t another way). But it also seems that the overlap in functionality is pretty large between these two. And AlteredEnsemble
as a superclass reduces implementation of OptionalEnsemble
to its .__init__()
and .__str__()
methods.
The other option here would be to have two classes, AlteredEnsemble
and AlteredTrajectoryEnsemble
, which implement nearly the same behavior as superclasses, and then have different trees of subclasses.
Any arguments against the name change?
It would be great if we could add one or two of the CB7 host-guest systems from SAMPL that @bas-rustenburg and @pgrinaway are working on, since I think this would be a nice, simple system to study bimolecular association/dissociation with. @bas-rustenburg and @pgrinaway are setting this up for OpenMM in implicit and explicit solvent, so you can probably just use their input/output files for this and the OpenMM Amber inpcrd/prmtop
reader.
If you have outdated or corrupted netCDF files, or if the files have been misused in some way, we generate an IndexError. It would be really nice if we caught those errors and added some information to make them a bit more informative. Example partial backtrace:
File "/Users/dwhs/Dropbox/msm-tis/opentis/openmm_engine.py", line 74, in auto
return OpenMMEngine._restore_from_storage(filename)
File "/Users/dwhs/Dropbox/msm-tis/opentis/openmm_engine.py", line 116, in _restore_from_storage
engine = OpenMMEngine(storage.template, options)
File "/Users/dwhs/Dropbox/msm-tis/opentis/storage/netcdf_storage.py", line 173, in template
return self.snapshot.load(int(self.variables['template_idx'][0]))
File "/Users/dwhs/Dropbox/msm-tis/opentis/storage/wrapper.py", line 68, in inner
obj = func(self, n_idx, *args, **kwargs)
File "/Users/dwhs/Dropbox/msm-tis/opentis/storage/snapshot_store.py", line 34, in load
configuration_idx = self.configuration_idx(idx)
File "/Users/dwhs/Dropbox/msm-tis/opentis/storage/snapshot_store.py", line 104, in configuration_idx
return int(self.load_variable('snapshot_configuration_idx', idx))
File "/Users/dwhs/Dropbox/msm-tis/opentis/storage/object_storage.py", line 434, in load_variable
return self.storage.variables[name][idx]
File "netCDF4.pyx", line 3015, in netCDF4.Variable.__getitem__ (netCDF4.c:41549)
File "netCDF4.pyx", line 3653, in netCDF4.Variable._get (netCDF4.c:49443)
IndexError
At some point on that backtrace, a clearer error message would be nice. (Most of this is that I'm an idiot: the problem is solved by deleting the bad file, but I forget that and spend hours trying to figure out which part of my recent full reinstall is broken and causing the error. Please, save me from continuing to waste hours being an idiot!)
I've registered openpathsampling.org
on godaddy. We'll configure this to point to the AWS S3 bucket openpathsampling
so you can push docs there once they are built by travis.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.