Code Monkey home page Code Monkey logo

pytraj's Introduction

Install | Features | How to get started? | Visualization

Build status Coverage Status

pytraj website

PYTRAJ

A Python front-end of [cpptraj program] (https://github.com/Amber-MD/cpptraj) (a data analysis package for biomolecular simulation).

Website: http://amber-md.github.io/pytraj

Features

  • support more than 80 types of data analyses (rmsd, radgyr, autoimage, pca, clustering,...)
  • read/write various file formats (.nc, .mdcrd, .dcd, .trr, .xtc, .pdb, .mol2, ...)
  • fast (core codes were written in C++ and Cython)
  • support parallel calculation with trivial installation (openmp, multiprocessing, mpi, ...)
  • interactive analysis with large trajectory data that does not fit to memory
  • [>> many more with comprehensive tutorials] (http://amber-md.github.io/pytraj)

Install

Supported platforms: Linux, OSX

  • The best way is to install AmberTools via conda: conda install -c conda-forge ambertools compilers (https://ambermd.org/GetAmber.php)

  • from AMBER suite distribution http://ambermd.org/.

  • from conda: conda install -c ambermd pytraj # Outdated versions, not support python >= 3.9

  • from pip: pip install pytraj # Outdated versions, not support python >= 3.9

  • from source code:

    git clone https://github.com/amber-md/pytraj
    cd pytraj
    
    python setup.py install
    
    # Note: openmp will be turned off in OSX.
    
    # AMBER user: overwrite pytraj in $AMBERHOME
    # For expert user only
    python setup.py install --prefix=$AMBERHOME
  • Getting trouble? : check our webpage

How to get started?

Contributors and Acknowledgement

Please check here

Citation

If you would like to acknowledge our work, please cite both cpptraj and pytraj.

Something like:

"...used pytraj [1], a Python package binding to cpptraj program [2]"
  • [1] PYTRAJ: Interactive data analysis for molecular dynamics simulations. Hai Nguyen, Daniel R. Roe, Jason Swails, David A. Case. (2016)

  • [2] [PTRAJ and CPPTRAJ] (http://pubs.acs.org/doi/abs/10.1021/ct400341p): Software for Processing and Analysis of Molecular Dynamics Trajectory Data. Daniel R. Roe and Thomas E. Cheatham, III Journal of Chemical Theory and Computation 2013 9 (7), 3084-3095

Question/Suggestion?

nglview with pytraj in Jupyter notebook

Demo: Interactive data exploration with [Jupyter notebook] (http://jupyter.org/)

pytraj website

License

GPL v3 (since pytraj is derived work of cpptraj)

But if you would like to reuse code snippets and pieces independent of cpptraj, I am (Hai) happy to license them (pieces of codes) under BSD-2 Clause or whatever you like. Just buzz us.

pytraj's People

Contributors

c56pony avatar drdomenicomarson avatar drroe avatar ecederstrand avatar envirodug avatar hainm avatar hichiaty avatar hongbo-zhu-cn avatar multiplemonomials avatar quantifiedcode-bot avatar rmeli avatar sastrys1 avatar swails avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytraj's Issues

basic tutorial: calculate distance

import pytraj as pt
  • by mask (same as cpptraj)
pt.distance(traj, '@C* @N*')
  • by a list/array/tuple ... of masks
pt.distance(traj, ['@C* @N*', '@H @CA'])
  • by integer index
pt.distance(traj, [[3,4],[6,7]])

Advantage of using distance by mask: use can calculate COM distance.

Make faster linking

Currently linking pytraj *.pyx file to libcpptraj is still too slow (about 4 times slower than compiling cpptraj). Is there anyway to make it faster?

basic tutorial: load xtc and gro files

currently cpptraj does not support gro and xtc files. We need to use mdtraj engine.

In [2]: traj = pt.load("./nvt.xtc", "nvt.gro", engine='mdtraj')

In [3]: traj
Out[3]:
<pytraj.Trajectory with 11 frames: <Topology with 88812 atoms, 28161 residues, 
1 mols, 0 bonds, PBC with box type = ortho>>

reimaging and interfacing with mdraj

Hi,

On suggestion of @swails , I'm moving mdtraj/mdtraj#814 to here. Thanks to @hainm , I have much cleaner code now for reimaging a single pdb. load_mdtraj was very helpful.

Since I want to add pairwise distances as features to my trajectory, I need to reimage all trajectories. However, I realize that while it's easy to go from mdtraj --> pytraj, it's nontrivial to go from pytraj --> mdtraj , either directly or via HDF5 format. So I had to develop a little workaround. It seems to work, but it would be nice if it were cleaner. The gist is that I load a .h5 trajectory in mdtraj, use pytraj's load_mdtraj feature to transfer it over, autoimage() it, save it as a dcd trajectory, separately save a frame as a pdb to act as the topology, re-load that dcd/pdb combo into mdtraj, and then save it once again as a .h5 to complete the cycle. It would be great if I could circumvent the middle steps and just dump a pytraj object straight to h5, or convert it back somehow to an mdtraj object. Any thoughts?

Cheers,
Evan

import mdtraj as md
import os
import h5py
import multiprocessing as mp
from functools import partial 
from pytraj import adict 

def get_trajectory_files(traj_dir, ext = ".h5"):
    traj_files = []
    for traj in os.listdir(traj_dir):
            if traj.endswith(ext):
                traj_files.append("%s/%s" %(traj_dir,traj))
    return sorted(traj_files)

def reimage_traj(traj_file, save_dir):
    traj = md.load(traj_file)
    topology = md.load_frame(traj_file,index=0)

    traj_pytraj = mdio.load_mdtraj(traj)
    traj_pytraj.autoimage()

    filename = traj_file.split("/")[len(traj_file.split("/"))-1]
    filename = filename.split(".")[0]
    dcd_filename = "%s.dcd" %filename
    top_filename = "%s.pdb" %filename
    h5_filename = "%s.h5" %filename
    new_dcd_file = "%s/%s" %(save_dir, dcd_filename)
    new_top_file = "%s/%s" %(save_dir, top_filename)
    new_h5_file = "%s/%s" %(save_dir, h5_filename)
    print new_dcd_file
    print new_top_file
    traj_pytraj.save(new_dcd_file)

    topology.save(new_top_file)

    new_traj = md.load(new_dcd_file, top = new_top_file)
    new_traj.save(new_h5_file)
    return


def reimage_trajs(traj_dir):
    new_dir = "%s_reimaged" %traj_dir

    if not os.path.exists(new_dir): os.makedirs(new_dir)

    trajs = get_trajectory_files(traj_dir)

    reimage = partial(reimage_traj, save_dir = new_dir)

    num_workers = mp.cpu_count()
    pool = mp.Pool(num_workers)
    pool.map(reimage, trajs)
    pool.terminate()
    return

traj_dir = "/scratch/users/enf/b2ar_analysis/subsampled_allprot_combined"
reimage_trajs(traj_dir)

basic tutorial: combine trajs with different topologies

cpptraj input is from RAHMAN in amber-mailinglist ("[AMBER] Error during action on loaded multiple trajectory in cpptraj 15")

parm 0system/md_out/0SPC.top
parm 10system/md_out/10SPC.top
parm 20system/md_out/20SPC.top
parm 30system/md_out/30SPC.top
parm 40system/md_out/40SPC.top
parm 50system/md_out/50SPC.top

#load trajectory

trajin 0system/md_out/md_file/6_md5_0SPC.crd
trajin 10system/md_out/md_file/6_md5_10SPC.crd parmindex 1
trajin 20system/md_out/md_file/6_md5_20SPC.crd parmindex 2
trajin 30system/md_out/md_file/6_md5_30SPC.crd parmindex 3
trajin 40system/md_out/md_file/6_md5_40SPC.crd parmindex 4
trajin 50system/md_out/md_file/6_md5_50SPC.crd parmindex 5

#Action command

strip :SPC outprefix nowater

trajout ura_0SPC.crd
trajout ura_10SPC.crd parmindex 1
trajout ura_20SPC.crd parmindex 2
trajout ura_30SPC.crd parmindex 3
trajout ura_40SPC.crd parmindex 4
trajout ura_50SPC.crd parmindex 5

too slow to convert parmed to pseudo_parm

46 s for below code. ack

>>> import parmed as pmd
>>> pdb = pmd.download_PDB("1o15")
>>> top = io.load_pseudo_parm(pdb)

reason: stuck at 'guess_bond' step. need to check cpptraj's code. (if remove this line, converting takes only < 2s)

How is pytraj different from MDAnalysis and mdtraj?

Note: this is from my experience of using MDAnalysis and mdtraj. Since I spend most of my time playing with pytraj, there will be bias for this package. If my information about MDAnalysis and mdtraj is not correct, please feel free to comment here.

Each sub-section below will be filled more in future.

basic tutorial: load trajectory's raw coordinates with minimum memory

Aim: load to memory 3D-coordinates of @ca atoms, autoimaged and rmsfit to referent?

>>> # load TrajectoryIterator for saving memory
>>> import pytraj as pt
>>> trajiter = pt.iterload(filename, topname)
>>> # load coordinates to memory
>>> pt.get_coordinates(trajiter(mask='@CA', autoimage=True, rmsfit=(0, '!@H=')))

what's new

support string mask and array-mask for most of pytraj calculation.

(07-09-2015)

  • calc_jcoupling
In [8]: pt.calc_jcoupling(traj, ':3-7')[0]
Out[8]:
<pytraj.array.DataArray: size=10, key=THR:3_HA-CA-CB-HB, dtype=float32, ndim=1>
values:
[ 2.23212647  0.74871767  9.29722214  9.31067276  9.12228203  9.2781868
  7.73652697  9.33726883  7.81325817  8.82886505]

In [9]: indices
Out[9]: array([ 37,  38,  39, ..., 108, 109, 110])

In [10]: pt.calc_jcoupling(traj, indices)[0]
Out[10]:
<pytraj.array.DataArray: size=10, key=THR:3_HA-CA-CB-HB, dtype=float32, ndim=1>
values:
[ 2.23212647  0.74871767  9.29722214  9.31067276  9.12228203  9.2781868
  7.73652697  9.33726883  7.81325817  8.82886505]

where indices is similar to atom_indices in mdtraj.

  • rmsd
In [13]: pt.rmsd(traj, mask=[0, 5, 8])
Out[13]:
array([  1.04477021e-04,   1.47252858e-01,   5.10274623e-02,
         3.93252889e-02,   4.26393624e-02,   5.11736383e-02,
         2.02085530e-02,   1.57345439e-02,   6.51517868e-02,
         3.08103397e-02])

In [14]: pt.rmsd(traj, mask="@1,6,9")
Out[14]:
array([  1.04477021e-04,   1.47252858e-01,   5.10274623e-02,
         3.93252889e-02,   4.26393624e-02,   5.11736383e-02,
         2.02085530e-02,   1.57345439e-02,   6.51517868e-02,
         3.08103397e-02])
  • distance
In [33]: pt.distance(traj, '@2 @3')
Out[33]:
array([ 1.5654015 ,  1.64738735,  1.57970842,  1.58422811,  1.62185271,
        1.70217518,  1.68401503,  1.67934319,  1.6281608 ,  1.69595361])

In [34]: pt.distance(traj, [1, 2])
Out[34]:
array([[ 1.5654015 ],
       [ 1.64738735],
       [ 1.57970842],
       [ 1.58422811],
       [ 1.62185271],
       [ 1.70217518],
       [ 1.68401503],
       [ 1.67934319],
       [ 1.6281608 ],
       [ 1.69595361]])

In [35]: pt.distance(traj, [[1, 2], [8,9]])
Out[35]:
array([[ 1.5654015 ,  2.03200257],
       [ 1.64738735,  2.03791449],
       [ 1.57970842,  2.03800926],
       ...,
       [ 1.67934319,  1.99721433],
       [ 1.6281608 ,  1.98635603],
       [ 1.69595361,  2.11291366]])

My motivation for this is amber mask is too long for simple mask, such as '@1 @2' vs [1, 2]. And too much @ makes script look not very pretty. However, amber mask is very powerful for complicated mask, such as ':1 :2'. So I implement both mask types.

PS: I also want to give credit to mdtraj since I find it convenient to use array-mask for simple case when playing with it.

TODO add mask = an array of strings too.

In [38]: pt.distance(traj, [':2 :3', '@4 @7'])
Out[38]:
array([[ 6.17554905,  6.32215072,  6.29471347, ...,  6.27913704,
         6.36268277,  6.28279934],
       [ 2.78077699,  2.49659225,  2.8758351 , ...,  2.92945951,
         2.6963403 ,  2.75728637]])

Why `FrameArray` and `TrajReadOnly`?

I think this is somewhat related to one of your comments in #260, but I wonder what the rationale is to have FrameArray and TrajReadOnly as two separate objects storing the same information.

This adds complexity to the underlying API (that does not exist in mdtraj), and this complexity is not alleviated by having convenience methods to translate between them, IMO. For a library application like this one (or mdtraj), I've found that those in which I can easily memorize the entire object model and data flow design are the easiest (and most enjoyable) to use. This includes even very complex packages, like pandas and matplotlib (they try hard to keep their object models clean and simple). I may have to look up what methods to use to do a certain task, but knowing the object model and data flow makes it easy to know where to look.

Unless I'm missing something, there's nothing that TrajReadOnly can do (functionally) that FrameArray cannot, correct? The only difference is that the former is immutable. So what's the point of TrajReadOnly? Why not always use FrameArray and just get rid of the other one? It's easy to make FrameArray immutable... just don't change it. I don't see why this has to be enforced by the library, and is rather against the design principles of Python itself (i.e., "We're all consenting adults").

pytraj branch note

note for me about different branches in hainm/pytraj

cleaning: cleanning cpptraj files
FrameIter: traj.frame_iter() will return this class. It's almost useless but giving information about start, stop, stride, ... in ipython. Intended for internal use only.
ptools: add tools module for storing some useful functions (as Jason suggested)

all above branches will merge each other. Don't merge to master yet.

dataset_enhance: can be merged to `master.

doc: just for prepare/write pytraj doc. can be merged to any branch.

dssp: writing DSSP class for easier extract data. Don't merge to master yet?

plot_nice: for plotting, can be merged to any.

todo

  • might use DataSet_Coords_TRJ instead of Trajin for TrajectoryIterator
    (check branch dataset_trj)

pytraj's note

this topis is NOT an issue. Just want to take note about pytraj's features. (mostly gathering bullet points for pytraj's paper).

@swails: FYI about slicing numpy array does not always give a view

slicing pytraj's Trajectory always gives a view

In [3]: import mdtraj as md

In [4]: from mdtraj.testing import get_fn
No module named 'pytz'

In [5]: # load to mdtraj's object

In [6]: m_traj = md.load(get_fn("frame0.h5"))

In [7]: # load to pytraj's object

In [8]: ptraj = io.load_mdtraj (m_traj)

In [9]: # create a `view` for `ptraj`

In [10]: view_ptraj = ptraj[[0, 1, 10]]

In [11]: print (view_ptraj[0, 0, 0])
4.289999902248383

In [12]: # make assignment

In [13]: view_ptraj[0, 0, 0] = 100.

In [14]: # make sure we update pytraj too

In [15]: print (ptraj[0, 0, 0])
100.0

In [16]: assert ptraj[0, 0, 0] == view_ptraj[0, 0, 0] == 100.

vs numpy version

In [4]: from mdtraj.testing import get_fn
No module named 'pytz'

In [5]: # load to mdtraj's object

In [6]: m_traj = md.load(get_fn("frame0.h5")) # from above step in `pytraj` demonstration

In [36]: view_m_traj = m_traj[[0, 1, 10]]

In [37]: print (view_m_traj.xyz[0, 0, 0]) # 0.429
0.429

In [38]: # make assignment

In [39]: view_m_traj.xyz[0, 0, 0] = 100.

In [40]: assert view_m_traj.xyz[0, 0, 0] == 100.

In [41]: # this does not update `m_traj`

In [42]: print (view_m_traj.xyz[0, 0, 0]) # 100.
100.0

In [43]: print (m_traj.xyz[0, 0, 0]) # 0.429, not updated
0.429

In [44]: # assert will fail

In [45]: assert m_traj.xyz[0, 0, 0] == view_m_traj.xyz[0, 0, 0] == 100.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-45-182db33f074b> in <module>()
----> 1 assert m_traj.xyz[0, 0, 0] == view_m_traj.xyz[0, 0, 0] == 100.

AssertionError:

too slow to import pytraj

0.8 s just for import pytraj as pt (vs 0.3 s for import numpy as np; import scipy), (vs ~0.4 s for both mdtraj and MDAnalysis). ack.

speed: pytraj vs mdtraj with rmsd

Note I have a ipython notebook before but I've just moved some codes from cython to python
to make faster code writing and testing. So just want to compare rmd calculation again.

In [68]: traj # pytraj
Out[68]:
<pytraj.Trajectory with 1000 frames: <Topology with 17443 atoms, 5666 residues, 
1 mols, 17452 bonds, PBC with box type = truncoct>>


In [69]: t # mdtraj
Out[69]: <mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, 
and unitcells at 0x2aaacc4155c0>

inconsistent result pytraj vs cpptraj when combining autoimage with rmsfit

this happen when combining autoimage and rmsfit at the same time.

use only autoimage: match (rmsd<1E-10)
use only rmsfit: match

but if combining them, rmsd=0.6 to saved cpptraj traj. ack.

In [8]: import pytraj as pt

In [9]: traj = pt.iterload(['./data/tz2.ortho.nc', ], './data/tz2.ortho.parm7')

In [10]: xyz = pt.get_coordinates(traj(autoimage=True, rmsfit=(0, '@CA,C,N'))
)

In [11]: saved_traj = pt.iterload('./data/tz2.autoimage_with_rmsfit.nc', traj.top)

In [12]: rmsd(saved_traj.xyz, xyz)
Out[12]: 0.6349591011653271

all frames were shifted (vs saved traj from cpptraj)
rmsd

0.6349589654298733
0.6349591258302947
0.6349591168337528
0.6349591280414643
0.6349591161553934
0.634959115164595
0.6349591093341331
0.6349591142336568
0.6349591255565542
0.6349591065071268

basic tutorial: slicing trajectory

(prepare for pytraj's official doc website)

First, need to create a new Trajectory

In [24]: import pytraj as pt

In [25]: traj = pt.iterload("./data/md1_prod.Tc5b.x", "./data/Tc5b.top")

In [26]: traj
Out[26]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1 mols, 20 residues, 304 atoms, 
310 bonds, non-PBC>>

pytraj supports very simple slicing scheme. Basically, traj[x, y, z] is equal to traj[x][y][z]

In [42]: np.all(traj[2:8, '@CA'].xyz == traj[2:8]['@CA'].xyz)
Out[42]: True

Examples

  • get single frame
In [18]: traj[0]
Out[18]: <Frame with 304 atoms>
  • get new trajectory
In [19]: traj[2:8]
Out[19]:
<pytraj.Trajectory with 6 frames: <Topology with 1 mols, 20 residues, 304 atoms, 
310 bonds, non-PBC>>
  • get new trajectory having only CA atoms
In [20]: traj[2:8, '@CA']
Out[20]:
<pytraj.Trajectory with 6 frames: <Topology with 20 mols, 20 residues, 20 atoms, 
0 bonds, non-PBC>>
  • get a new Trajectory having only last frame and CA atoms
In [39]: traj[-1:, '@CA']
Out[39]:
<pytraj.Trajectory with 1 frames: <Topology with 20 mols, 20 residues, 20 atoms, 
0 bonds, non-PBC>>
  • get single coordinate for an atom
# traj[0, 0, :] # xyz coordinate of first atom in first frame
In [21]: traj[0, 0, :]
Out[21]: array([-16.492,  12.434, -11.018])

conda issue

conda install -c pytraj pytraj-dev

it's still tricky to install a pytraj version that will satisfy all users. It's better to install from source code (github) for now.

  • libstdc related issue
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/__init__.py", line 11, in <module>
    from .core import Atom, Residue, Molecule
  File "/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/core/__init__.py", line 2, in <module>
    from .Atom import Atom
ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by 
/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/core/../../../../libcpptraj.so)

basic tutorial: moving_average and plot

In [17]: import pytraj as pt

In [18]: dslist  = pt.calc_phi(traj)

In [19]: pt.tools.moving_average(dslist[0], 3).plot(show=True)
Out[19]: [<matplotlib.lines.Line2D at 0x2aaac5c065f8>]

vmd style Atom selection

Should we use vmd style atom selection in pytraj/cpptraj?

mdtraj program does pretty good job for this
http://mdtraj.org/latest/atom_selection.html

if "YES", we can use this shorter syntax: top("water") (return atom mask object)

>>> atm_object = top("water")
>>> # print selected indices for atom with given mask
>>> print atm_object.n_selected
>>> # we can use another way to get atom indices
>>> top.get_indices("water")
>>> # extract coordinates of atom with given mask
>>> frame[top("water")]
>>> # if we don't want to use above step, we can do this
>>> frame.set_top(top)
>>> frame["water"]

basic tutorial: hbond search

In [69]: import pytraj as pt

In [70]: pdb = pt.load_pdb_rcsb("1l2y")

In [71]: pdb
Out[71]:
<pytraj.Trajectory with 38 frames: <Topology with 304 atoms, 20 residues, 1 mols, 310 bonds, 
non-PBC>>


In [72]: dslist = pt.search_hbonds(pdb)

In [73]: h = pt.hbonds.HbondAnalysisResult(dslist)

In [74]: h
Out[74]: <pytraj.hbonds.HbondAnalysisResult at 0x2aac478062b0>

In [75]: h.lifetime(cut=0.5)
Out[75]:
{'ARG16_O-TRP6_NE1-HE1': 0.9473684210526315,
 'ILE4_O-LYS8_N-H': 0.7631578947368421,
 'LEU7_O-GLY10_N-H': 0.8947368421052632,
 'TYR3_O-LEU7_N-H': 1.0}

TODO: add text

Q&A

Draft for Q&A (will be copied to ipython-notebook)

pytraj developers don't recommnd to use traj.xyz (as a numpy 3D array with shape=(n_frames, n_atoms, 3))?

Because Trajectory object in pytraj is a wrapper of C++ vector in cpptraj. Everytime calling traj.xyz method, a copy of 3D array is made. User can not inplace update coords of each frame or specific atoms. Instead using traj[index] to do this. However, traj.xyz is still available to conviently access read-only data.

traj['@CA'] = xyz # inplace update `xyz` coords of all CA atoms

So why not designing xyz as mdtraj with numpy? Because cpptraj already has more than few hundreds types of action/analysis combination (from things very simple like calculating distance, dihedral, ... to very smart 'autoimage' action). If we design like mdtraj, we need to copy data from Frame object (cpptraj) to numpy data. If we want to use cpptraj's action/analysis, we again need to copy data from numpy back to Frame object.

We DO have numpy-backend for simple math for traj

traj += traj2
traj -= traj2
traj /= traj2
traj *= traj2

traj.apply(lambda x : x + 100.) # `x` is `xyz` coords of each frame

info: TrajectoryIterator and memory

super memory saving with 'TrajectoryIterator` (and out-of-core calculation)

  • user is able to load as many trajectory files to TrajectoryIterator
In [12]: len(traj.filelist)
Out[12]: 28432
  • estimated memory (MB) if loading all of them into RAM

(> 36 GB of data).

In [15]: traj._estimated_MB
Out[15]: 36119.00856

The real memory

In [16]: traj
Out[16]:
<pytraj.TrajectoryIterator with 284330 frames: <Topology with 5293 atoms, 1704 residues, 
1692 mols, 5300 bonds, PBC with box type = ortho>>

In [16] used 0.0117 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 
271.99 MiB

performance: pytraj vs cpptraj

just want to test how much pytraj is slower than cpptraj.

info

In [18]: traj
Out[18]:
<pytraj.TrajectoryIterator with 1000 frames: 

<Topology with 17443 atoms, 5666 residues, 5634 mols, 17452 bonds, 
PBC with box type = truncoct>>

cpptraj speed:

In [16]: %%time
pt.datafiles.cpptraj_dry_run("""
parm ./myparm.top
trajin remd.x.000
autoimage
rms first @CA
matrix dist @CA""")
   ....:
CPU times: user 3.5 s, sys: 346 ms, total: 3.85 s
Wall time: 3.85 s

pytraj speed

In [17]: %time pt.matrix_analysis.distance_matrix(traj(autoimage=True, rmsfit=(0, '@CA')), 
                                                                                  '@CA')
CPU times: user 4.12 s, sys: 358 ms, total: 4.48 s
Wall time: 4.48 s

overall, it tooks about 3.5 s for cpptraj while 4.12 s for pytraj.

Now I only need to worry about memory leak in some parts of pytraj.

Note: both cpptraj and pytraj use out-of-core calculation. If all frames are loaded to memory, above calculation only took 2.86 s for pytraj (vs 3.5 s for cpptraj).

memory error

wrong result

In [18]: from pytraj import dihedral_analysis as da
In [19]: da.calc_psi (traj, dtype='ndarray')

Right result

In [21] dslist = da.calc_psi (traj)
In [22] dslist.to_ndarray()

Reason: memory is freed to soon. :D

memory leak

In [148]: %memit traj.calc_matrix()[:]
peak memory: 379.74 MiB, increment: 0.85 MiB

In [149]: %memit traj.calc_matrix()[:]
peak memory: 380.65 MiB, increment: 0.90 MiB

In [150]: %memit traj.calc_matrix()[:]
peak memory: 381.54 MiB, increment: 0.87 MiB

In [151]: %memit traj.calc_matrix()[:]
peak memory: 382.46 MiB, increment: 0.91 MiB

In [152]: %memit traj.calc_matrix()[:]
peak memory: 383.37 MiB, increment: 0.89 MiB

segmentation fault

STATUS: FIXED

  • traj.search_hbonds()[0], object lifetime again. I am not sure I need to use 'dtype='dataset'' as default input or not. May be just "dtype='dict'" to return a dict of numpy array. Trouble would be solved.
    But it's nice to understand about those stuff.

Note:

  • traj.search_hbonds(dtype='dict')['ASN_1@OD1-ASN_1@N-H3'] is ok.
  • traj.search_hbonds(dtype='dataframe')[0] is ok
  • traj.search_hbonds(dtype='ndarray')[0] is ok

basic tutorial: save traj with autoimage and rmsfit

In [16]: import pytraj as pt

In [17]: traj = pt.iterload(['./data/tz2.ortho.nc', ], './data/tz2.ortho.parm7')

In [18]: pt.write_traj("test.nc", traj(autoimage=True, rmsfit=(0, '@CA,N,O')), overwrite=True)

basic tutorial: download pdb, rmsd then plot

In [1]: import pytraj as pt

In [2]: pdb = pt.load_pdb_rcsb("1l2y")

In [3]: pdb.rmsd(mask="@CA", dtype='dataset').plot(True)
Out[3]: [<matplotlib.lines.Line2D at 0x2aaac49bd438>]

change `common_actions` name to something else

currently most of pytraj's actions (calc_, do_, ..) are in this common_actions module. It's terrible name. Need to rename it.

Ideally, we can split to sub-actions like mdtraj, but cpptraj has > 100 actions. It's not easy to keep track.

So, for most commonly used ones (rmsd, radgyr, dihedral, angle, rdf, vector, ..) are in pytraj main namespace.

We also have `pytraj.

In [2]: pytraj.dihedral_analysis.
pytraj.dihedral_analysis.absolute_import           pytraj.dihedral_analysis.calc_omega
pytraj.dihedral_analysis.calc_alpha                pytraj.dihedral_analysis.calc_phi
pytraj.dihedral_analysis.calc_beta                 pytraj.dihedral_analysis.calc_psi
pytraj.dihedral_analysis.calc_chin                 pytraj.dihedral_analysis.calc_zeta
pytraj.dihedral_analysis.calc_chip                 pytraj.dihedral_analysis.g_dict
pytraj.dihedral_analysis.calc_delta                pytraj.dihedral_analysis.iteritems
pytraj.dihedral_analysis.calc_epsilon              pytraj.dihedral_analysis.my_func_str
pytraj.dihedral_analysis.calc_gamma                pytraj.dihedral_analysis.print_function
pytraj.dihedral_analysis.calc_multidihedral        pytraj.dihedral_analysis.supported_dihedral_types
pytraj.dihedral_analysis.calc_nu1                  pytraj.dihedral_analysis.template
pytraj.dihedral_analysis.calc_nu2

basic tutorial: load traj to memory with "autoimage, rmsfit and strip atoms"

  • Aim: load traj to memory with autoimage, then rms to first frame with @cb mask, then strip all atoms but @ca.
>>> import pytraj as pt
>>> traj = pt.iterload(your_big_traj_file, top_file)
>>> for frame in traj(mask='@CA', autoimage=True, rmsfit=(traj[0], '@CB')):
>>>    pass

is equal to cpptraj's command below

parm top_file
trajin your_big_traj_file
autoimage
rms first @CB
strip !@CA

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.