amber-md / pytraj Goto Github PK

Python interface of cpptraj

Home Page: https://amber-md.github.io/pytraj

Shell 0.34% Python 18.23% Rebol 5.41% Batchfile 0.01% Jupyter Notebook 66.93% CSS 0.37% JavaScript 0.07% Makefile 0.01% Roff 1.01% CMake 0.06% Cython 7.58%

interactive-analysis molecular-dynamics visualization trajectory-analysis ambermd ambertools

pytraj's Introduction

Install | Features | How to get started? | Visualization

PYTRAJ

A Python front-end of [cpptraj program] (https://github.com/Amber-MD/cpptraj) (a data analysis package for biomolecular simulation).

Website: http://amber-md.github.io/pytraj

Features
Install
How to get started?
Contributors and Acknowledgement
Citation
Question/Suggestion?
nglview with pytraj in Jupyter notebook
Demo: Interactive data exploration with Jupyter notebook
License

Features

support more than 80 types of data analyses (rmsd, radgyr, autoimage, pca, clustering,...)
read/write various file formats (.nc, .mdcrd, .dcd, .trr, .xtc, .pdb, .mol2, ...)
fast (core codes were written in C++ and Cython)
support parallel calculation with trivial installation (openmp, multiprocessing, mpi, ...)
interactive analysis with large trajectory data that does not fit to memory
[>> many more with comprehensive tutorials] (http://amber-md.github.io/pytraj)

Install

Supported platforms: Linux, OSX

The best way is to install AmberTools via conda: conda install -c conda-forge ambertools compilers (https://ambermd.org/GetAmber.php)
from AMBER suite distribution http://ambermd.org/.
from conda: conda install -c ambermd pytraj # Outdated versions, not support python >= 3.9
from pip: pip install pytraj # Outdated versions, not support python >= 3.9

from source code:

git clone https://github.com/amber-md/pytraj
cd pytraj

python setup.py install

# Note: openmp will be turned off in OSX.

# AMBER user: overwrite pytraj in $AMBERHOME
# For expert user only
python setup.py install --prefix=$AMBERHOME

Getting trouble? : check our webpage

How to get started?

examples:

import pytraj as pt
traj = pt.iterload("data.nc", "top.parm7")
pt.rmsd(traj, mask='@CA', ref=0)
pt.dssp(traj, mask=':2-16')
pt.pca(traj, mask='!@H=', n_vecs=2)

check our website: [http://amber-md.github.io/pytraj] (http://amber-md.github.io/pytraj)

Contributors and Acknowledgement

Please check here

Citation

If you would like to acknowledge our work, please cite both cpptraj and pytraj.

Something like:

"...used pytraj [1], a Python package binding to cpptraj program [2]"

[1] PYTRAJ: Interactive data analysis for molecular dynamics simulations. Hai Nguyen, Daniel R. Roe, Jason Swails, David A. Case. (2016)
[2] [PTRAJ and CPPTRAJ] (http://pubs.acs.org/doi/abs/10.1021/ct400341p): Software for Processing and Analysis of Molecular Dynamics Trajectory Data. Daniel R. Roe and Thomas E. Cheatham, III Journal of Chemical Theory and Computation 2013 9 (7), 3084-3095

Question/Suggestion?

code issue and stuff relating directly to pytraj, create Issue
ask question about data analysis in general, send email to [AMBER Mailing List] (http://lists.ambermd.org/mailman/listinfo/amber)

nglview with pytraj in Jupyter notebook

Demo: Interactive data exploration with [Jupyter notebook] (http://jupyter.org/)

License

GPL v3 (since pytraj is derived work of cpptraj)

But if you would like to reuse code snippets and pieces independent of cpptraj, I am (Hai) happy to license them (pieces of codes) under BSD-2 Clause or whatever you like. Just buzz us.

pytraj's People

Contributors

Stargazers

Watchers

Forkers

hainm drroe swails lambdalisue josejames00 rafwiewiora sridharkumarkannam nshung2010 envirodug multiplemonomials brycestx hongbo-zhu-cn aspirincode rmeli nschimme shozebhaider gimba merzlab recherhe alirezatafazzol jackyzhao010 burmist-git minghao2016 rahmanhpu htsurui watarukotsugai peacewolfus007 hai-schrodinger antonsperera ceiba-wu hichiaty mackevinbraza aariam qqlaoxia sastrys1 ftry phimykah drdomenicomarson

pytraj's Issues

segementation fault for DSSP + Hbond when loading external package (mdtraj, mdanalysis)

need to read cpptraj's code to make sure loading correct info.

basic tutorial: calculate distance

import pytraj as pt

by mask (same as cpptraj)

pt.distance(traj, '@C* @N*')

by a list/array/tuple ... of masks

pt.distance(traj, ['@C* @N*', '@H @CA'])

by integer index

pt.distance(traj, [[3,4],[6,7]])

Advantage of using distance by mask: use can calculate COM distance.

Make faster linking

Currently linking pytraj *.pyx file to libcpptraj is still too slow (about 4 times slower than compiling cpptraj). Is there anyway to make it faster?

nobox does not work with DataSet_Coords_TRJ

happen when switching from Trajin class to DataSet_Coords_TRJ

(why switching? to load multiple trajectory files into a single TrajectoryIterator)

basic tutorial: load xtc and gro files

currently cpptraj does not support gro and xtc files. We need to use mdtraj engine.

In [2]: traj = pt.load("./nvt.xtc", "nvt.gro", engine='mdtraj')

In [3]: traj
Out[3]:
<pytraj.Trajectory with 11 frames: <Topology with 88812 atoms, 28161 residues, 
1 mols, 0 bonds, PBC with box type = ortho>>

pytraj + python trick

adapted from http://sahandsaba.com/thirty-python-language-features-and-tricks-you-may-not-know.html

and other sources and my experience.

basic tutorial: calculate "chi" dihedral angles for DNA and polar plot

In [8]: dslist = pt.calc_chin(traj, range360=True)

In [9]: pt.plotting.polar(dslist[0], 'r.')
Out[9]: <matplotlib.axes._subplots.PolarAxesSubplot at 0x7fbed788bb70>

In [10]: pt.plotting.plt.savefig("test.png")

try and except

just to remind me about using proper except catch

hainm/ParmEd#2

ipython-parallel interactive with ptraj

Need to fill detail.

Travis reports "passing" build even with failures

See https://travis-ci.org/hainm/pytraj/jobs/61145601#L2590-L2591

There is a failure in the test, but Travis still tags the build as having "passed". Is this intentional? It's rather atypical in my experience...

reimaging and interfacing with mdraj

Hi,

On suggestion of @swails , I'm moving mdtraj/mdtraj#814 to here. Thanks to @hainm , I have much cleaner code now for reimaging a single pdb. load_mdtraj was very helpful.

Since I want to add pairwise distances as features to my trajectory, I need to reimage all trajectories. However, I realize that while it's easy to go from mdtraj --> pytraj, it's nontrivial to go from pytraj --> mdtraj , either directly or via HDF5 format. So I had to develop a little workaround. It seems to work, but it would be nice if it were cleaner. The gist is that I load a .h5 trajectory in mdtraj, use pytraj's load_mdtraj feature to transfer it over, autoimage() it, save it as a dcd trajectory, separately save a frame as a pdb to act as the topology, re-load that dcd/pdb combo into mdtraj, and then save it once again as a .h5 to complete the cycle. It would be great if I could circumvent the middle steps and just dump a pytraj object straight to h5, or convert it back somehow to an mdtraj object. Any thoughts?

Cheers,
Evan

import mdtraj as md
import os
import h5py
import multiprocessing as mp
from functools import partial 
from pytraj import adict 

def get_trajectory_files(traj_dir, ext = ".h5"):
    traj_files = []
    for traj in os.listdir(traj_dir):
            if traj.endswith(ext):
                traj_files.append("%s/%s" %(traj_dir,traj))
    return sorted(traj_files)

def reimage_traj(traj_file, save_dir):
    traj = md.load(traj_file)
    topology = md.load_frame(traj_file,index=0)

    traj_pytraj = mdio.load_mdtraj(traj)
    traj_pytraj.autoimage()

    filename = traj_file.split("/")[len(traj_file.split("/"))-1]
    filename = filename.split(".")[0]
    dcd_filename = "%s.dcd" %filename
    top_filename = "%s.pdb" %filename
    h5_filename = "%s.h5" %filename
    new_dcd_file = "%s/%s" %(save_dir, dcd_filename)
    new_top_file = "%s/%s" %(save_dir, top_filename)
    new_h5_file = "%s/%s" %(save_dir, h5_filename)
    print new_dcd_file
    print new_top_file
    traj_pytraj.save(new_dcd_file)

    topology.save(new_top_file)

    new_traj = md.load(new_dcd_file, top = new_top_file)
    new_traj.save(new_h5_file)
    return


def reimage_trajs(traj_dir):
    new_dir = "%s_reimaged" %traj_dir

    if not os.path.exists(new_dir): os.makedirs(new_dir)

    trajs = get_trajectory_files(traj_dir)

    reimage = partial(reimage_traj, save_dir = new_dir)

    num_workers = mp.cpu_count()
    pool = mp.Pool(num_workers)
    pool.map(reimage, trajs)
    pool.terminate()
    return

traj_dir = "/scratch/users/enf/b2ar_analysis/subsampled_allprot_combined"
reimage_trajs(traj_dir)

basic tutorial: combine trajs with different topologies

cpptraj input is from RAHMAN in amber-mailinglist ("[AMBER] Error during action on loaded multiple trajectory in cpptraj 15")

parm 0system/md_out/0SPC.top
parm 10system/md_out/10SPC.top
parm 20system/md_out/20SPC.top
parm 30system/md_out/30SPC.top
parm 40system/md_out/40SPC.top
parm 50system/md_out/50SPC.top

#load trajectory

trajin 0system/md_out/md_file/6_md5_0SPC.crd
trajin 10system/md_out/md_file/6_md5_10SPC.crd parmindex 1
trajin 20system/md_out/md_file/6_md5_20SPC.crd parmindex 2
trajin 30system/md_out/md_file/6_md5_30SPC.crd parmindex 3
trajin 40system/md_out/md_file/6_md5_40SPC.crd parmindex 4
trajin 50system/md_out/md_file/6_md5_50SPC.crd parmindex 5

#Action command

strip :SPC outprefix nowater

trajout ura_0SPC.crd
trajout ura_10SPC.crd parmindex 1
trajout ura_20SPC.crd parmindex 2
trajout ura_30SPC.crd parmindex 3
trajout ura_40SPC.crd parmindex 4
trajout ura_50SPC.crd parmindex 5

too slow to convert parmed to pseudo_parm

46 s for below code. ack

>>> import parmed as pmd
>>> pdb = pmd.download_PDB("1o15")
>>> top = io.load_pseudo_parm(pdb)

reason: stuck at 'guess_bond' step. need to check cpptraj's code. (if remove this line, converting takes only < 2s)

How is pytraj different from MDAnalysis and mdtraj?

Note: this is from my experience of using MDAnalysis and mdtraj. Since I spend most of my time playing with pytraj, there will be bias for this package. If my information about MDAnalysis and mdtraj is not correct, please feel free to comment here.

Each sub-section below will be filled more in future.

TODO on travis

should print out the test script having segmentation fault. https://travis-ci.org/hainm/pytraj/jobs/65121697

basic tutorial: load trajectory's raw coordinates with minimum memory

Aim: load to memory 3D-coordinates of @ca atoms, autoimaged and rmsfit to referent?

>>> # load TrajectoryIterator for saving memory
>>> import pytraj as pt
>>> trajiter = pt.iterload(filename, topname)
>>> # load coordinates to memory
>>> pt.get_coordinates(trajiter(mask='@CA', autoimage=True, rmsfit=(0, '!@H=')))

what's new

support string mask and array-mask for most of pytraj calculation.

(07-09-2015)

calc_jcoupling

In [8]: pt.calc_jcoupling(traj, ':3-7')[0]
Out[8]:
<pytraj.array.DataArray: size=10, key=THR:3_HA-CA-CB-HB, dtype=float32, ndim=1>
values:
[ 2.23212647  0.74871767  9.29722214  9.31067276  9.12228203  9.2781868
  7.73652697  9.33726883  7.81325817  8.82886505]

In [9]: indices
Out[9]: array([ 37,  38,  39, ..., 108, 109, 110])

In [10]: pt.calc_jcoupling(traj, indices)[0]
Out[10]:
<pytraj.array.DataArray: size=10, key=THR:3_HA-CA-CB-HB, dtype=float32, ndim=1>
values:
[ 2.23212647  0.74871767  9.29722214  9.31067276  9.12228203  9.2781868
  7.73652697  9.33726883  7.81325817  8.82886505]

where indices is similar to atom_indices in mdtraj.

rmsd

In [13]: pt.rmsd(traj, mask=[0, 5, 8])
Out[13]:
array([  1.04477021e-04,   1.47252858e-01,   5.10274623e-02,
         3.93252889e-02,   4.26393624e-02,   5.11736383e-02,
         2.02085530e-02,   1.57345439e-02,   6.51517868e-02,
         3.08103397e-02])

In [14]: pt.rmsd(traj, mask="@1,6,9")
Out[14]:
array([  1.04477021e-04,   1.47252858e-01,   5.10274623e-02,
         3.93252889e-02,   4.26393624e-02,   5.11736383e-02,
         2.02085530e-02,   1.57345439e-02,   6.51517868e-02,
         3.08103397e-02])

distance

In [33]: pt.distance(traj, '@2 @3')
Out[33]:
array([ 1.5654015 ,  1.64738735,  1.57970842,  1.58422811,  1.62185271,
        1.70217518,  1.68401503,  1.67934319,  1.6281608 ,  1.69595361])

In [34]: pt.distance(traj, [1, 2])
Out[34]:
array([[ 1.5654015 ],
       [ 1.64738735],
       [ 1.57970842],
       [ 1.58422811],
       [ 1.62185271],
       [ 1.70217518],
       [ 1.68401503],
       [ 1.67934319],
       [ 1.6281608 ],
       [ 1.69595361]])

In [35]: pt.distance(traj, [[1, 2], [8,9]])
Out[35]:
array([[ 1.5654015 ,  2.03200257],
       [ 1.64738735,  2.03791449],
       [ 1.57970842,  2.03800926],
       ...,
       [ 1.67934319,  1.99721433],
       [ 1.6281608 ,  1.98635603],
       [ 1.69595361,  2.11291366]])

My motivation for this is amber mask is too long for simple mask, such as '@1 @2' vs [1, 2]. And too much @ makes script look not very pretty. However, amber mask is very powerful for complicated mask, such as ':1 :2'. So I implement both mask types.

PS: I also want to give credit to mdtraj since I find it convenient to use array-mask for simple case when playing with it.

TODO add mask = an array of strings too.

In [38]: pt.distance(traj, [':2 :3', '@4 @7'])
Out[38]:
array([[ 6.17554905,  6.32215072,  6.29471347, ...,  6.27913704,
         6.36268277,  6.28279934],
       [ 2.78077699,  2.49659225,  2.8758351 , ...,  2.92945951,
         2.6963403 ,  2.75728637]])

Why `FrameArray` and `TrajReadOnly`?

I think this is somewhat related to one of your comments in #260, but I wonder what the rationale is to have FrameArray and TrajReadOnly as two separate objects storing the same information.

This adds complexity to the underlying API (that does not exist in mdtraj), and this complexity is not alleviated by having convenience methods to translate between them, IMO. For a library application like this one (or mdtraj), I've found that those in which I can easily memorize the entire object model and data flow design are the easiest (and most enjoyable) to use. This includes even very complex packages, like pandas and matplotlib (they try hard to keep their object models clean and simple). I may have to look up what methods to use to do a certain task, but knowing the object model and data flow makes it easy to know where to look.

Unless I'm missing something, there's nothing that TrajReadOnly can do (functionally) that FrameArray cannot, correct? The only difference is that the former is immutable. So what's the point of TrajReadOnly? Why not always use FrameArray and just get rid of the other one? It's easy to make FrameArray immutable... just don't change it. I don't see why this has to be enforced by the library, and is rather against the design principles of Python itself (i.e., "We're all consenting adults").

pytraj branch note

note for me about different branches in hainm/pytraj

cleaning: cleanning cpptraj files
FrameIter: traj.frame_iter() will return this class. It's almost useless but giving information about start, stop, stride, ... in ipython. Intended for internal use only.
ptools: add tools module for storing some useful functions (as Jason suggested)

all above branches will merge each other. Don't merge to master yet.

dataset_enhance: can be merged to `master.

doc: just for prepare/write pytraj doc. can be merged to any branch.

dssp: writing DSSP class for easier extract data. Don't merge to master yet?

plot_nice: for plotting, can be merged to any.

basic example: Nucleic acid analysis

to prepare for official pytraj-doc website.

todo

might use DataSet_Coords_TRJ instead of Trajin for TrajectoryIterator
(check branch dataset_trj)

inconsistent API for getitem and setitem

traj['@ca'] return a new (copied)Trajectory with stripped atoms.

traj['@ca'] = xyz # assign coords for CA atoms

pytraj's note

this topis is NOT an issue. Just want to take note about pytraj's features. (mostly gathering bullet points for pytraj's paper).

@swails: FYI about slicing numpy array does not always give a view

slicing pytraj's `Trajectory` always gives a `view`

In [3]: import mdtraj as md

In [4]: from mdtraj.testing import get_fn
No module named 'pytz'

In [5]: # load to mdtraj's object

In [6]: m_traj = md.load(get_fn("frame0.h5"))

In [7]: # load to pytraj's object

In [8]: ptraj = io.load_mdtraj (m_traj)

In [9]: # create a `view` for `ptraj`

In [10]: view_ptraj = ptraj[[0, 1, 10]]

In [11]: print (view_ptraj[0, 0, 0])
4.289999902248383

In [12]: # make assignment

In [13]: view_ptraj[0, 0, 0] = 100.

In [14]: # make sure we update pytraj too

In [15]: print (ptraj[0, 0, 0])
100.0

In [16]: assert ptraj[0, 0, 0] == view_ptraj[0, 0, 0] == 100.

vs numpy version

In [4]: from mdtraj.testing import get_fn
No module named 'pytz'

In [5]: # load to mdtraj's object

In [6]: m_traj = md.load(get_fn("frame0.h5")) # from above step in `pytraj` demonstration

In [36]: view_m_traj = m_traj[[0, 1, 10]]

In [37]: print (view_m_traj.xyz[0, 0, 0]) # 0.429
0.429

In [38]: # make assignment

In [39]: view_m_traj.xyz[0, 0, 0] = 100.

In [40]: assert view_m_traj.xyz[0, 0, 0] == 100.

In [41]: # this does not update `m_traj`

In [42]: print (view_m_traj.xyz[0, 0, 0]) # 100.
100.0

In [43]: print (m_traj.xyz[0, 0, 0]) # 0.429, not updated
0.429

In [44]: # assert will fail

In [45]: assert m_traj.xyz[0, 0, 0] == view_m_traj.xyz[0, 0, 0] == 100.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-45-182db33f074b> in <module>()
----> 1 assert m_traj.xyz[0, 0, 0] == view_m_traj.xyz[0, 0, 0] == 100.

AssertionError:

too slow to import pytraj

0.8 s just for import pytraj as pt (vs 0.3 s for import numpy as np; import scipy), (vs ~0.4 s for both mdtraj and MDAnalysis). ack.

speed: pytraj vs mdtraj with rmsd

Note I have a ipython notebook before but I've just moved some codes from cython to python
to make faster code writing and testing. So just want to compare rmd calculation again.

In [68]: traj # pytraj
Out[68]:
<pytraj.Trajectory with 1000 frames: <Topology with 17443 atoms, 5666 residues, 
1 mols, 17452 bonds, PBC with box type = truncoct>>


In [69]: t # mdtraj
Out[69]: <mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, 
and unitcells at 0x2aaacc4155c0>

inconsistent result pytraj vs cpptraj when combining autoimage with rmsfit

this happen when combining autoimage and rmsfit at the same time.

use only autoimage: match (rmsd<1E-10)
use only rmsfit: match

but if combining them, rmsd=0.6 to saved cpptraj traj. ack.

In [8]: import pytraj as pt

In [9]: traj = pt.iterload(['./data/tz2.ortho.nc', ], './data/tz2.ortho.parm7')

In [10]: xyz = pt.get_coordinates(traj(autoimage=True, rmsfit=(0, '@CA,C,N'))
)

In [11]: saved_traj = pt.iterload('./data/tz2.autoimage_with_rmsfit.nc', traj.top)

In [12]: rmsd(saved_traj.xyz, xyz)
Out[12]: 0.6349591011653271

all frames were shifted (vs saved traj from cpptraj)
rmsd

0.6349589654298733
0.6349591258302947
0.6349591168337528
0.6349591280414643
0.6349591161553934
0.634959115164595
0.6349591093341331
0.6349591142336568
0.6349591255565542
0.6349591065071268

TODO: more descriptive name for Action_Matrix

make separate module for this class and use more specific method's name.

distance_matrix, correlation_matrix, ...

basic tutorial: slicing trajectory

(prepare for pytraj's official doc website)

First, need to create a new Trajectory

In [24]: import pytraj as pt

In [25]: traj = pt.iterload("./data/md1_prod.Tc5b.x", "./data/Tc5b.top")

In [26]: traj
Out[26]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1 mols, 20 residues, 304 atoms, 
310 bonds, non-PBC>>

pytraj supports very simple slicing scheme. Basically, traj[x, y, z] is equal to traj[x][y][z]

In [42]: np.all(traj[2:8, '@CA'].xyz == traj[2:8]['@CA'].xyz)
Out[42]: True

Examples

get single frame

In [18]: traj[0]
Out[18]: <Frame with 304 atoms>

get new trajectory

In [19]: traj[2:8]
Out[19]:
<pytraj.Trajectory with 6 frames: <Topology with 1 mols, 20 residues, 304 atoms, 
310 bonds, non-PBC>>

get new trajectory having only CA atoms

In [20]: traj[2:8, '@CA']
Out[20]:
<pytraj.Trajectory with 6 frames: <Topology with 20 mols, 20 residues, 20 atoms, 
0 bonds, non-PBC>>

get a new Trajectory having only last frame and CA atoms

In [39]: traj[-1:, '@CA']
Out[39]:
<pytraj.Trajectory with 1 frames: <Topology with 20 mols, 20 residues, 20 atoms, 
0 bonds, non-PBC>>

get single coordinate for an atom

# traj[0, 0, :] # xyz coordinate of first atom in first frame
In [21]: traj[0, 0, :]
Out[21]: array([-16.492,  12.434, -11.018])

conda issue

conda install -c pytraj pytraj-dev

it's still tricky to install a pytraj version that will satisfy all users. It's better to install from source code (github) for now.

libstdc related issue

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/__init__.py", line 11, in <module>
    from .core import Atom, Residue, Molecule
  File "/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/core/__init__.py", line 2, in <module>
    from .Atom import Atom
ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.11' not found (required by 
/home/haichit/anaconda3/envs/python3.4_pytraj_conda_amber/lib/python3.4/site-
packages/pytraj/core/../../../../libcpptraj.so)

basic tutorial: moving_average and plot

In [17]: import pytraj as pt

In [18]: dslist  = pt.calc_phi(traj)

In [19]: pt.tools.moving_average(dslist[0], 3).plot(show=True)
Out[19]: [<matplotlib.lines.Line2D at 0x2aaac5c065f8>]

vmd style Atom selection

Should we use vmd style atom selection in pytraj/cpptraj?

mdtraj program does pretty good job for this
http://mdtraj.org/latest/atom_selection.html

if "YES", we can use this shorter syntax: top("water") (return atom mask object)

>>> atm_object = top("water")
>>> # print selected indices for atom with given mask
>>> print atm_object.n_selected
>>> # we can use another way to get atom indices
>>> top.get_indices("water")
>>> # extract coordinates of atom with given mask
>>> frame[top("water")]
>>> # if we don't want to use above step, we can do this
>>> frame.set_top(top)
>>> frame["water"]

implement "describe" and str like in pandas

do it because it's ridiculous to install pandas to just use two methods. I myself got problem with using pandas in PY3.

basic tutorial: hbond search

In [69]: import pytraj as pt

In [70]: pdb = pt.load_pdb_rcsb("1l2y")

In [71]: pdb
Out[71]:
<pytraj.Trajectory with 38 frames: <Topology with 304 atoms, 20 residues, 1 mols, 310 bonds, 
non-PBC>>


In [72]: dslist = pt.search_hbonds(pdb)

In [73]: h = pt.hbonds.HbondAnalysisResult(dslist)

In [74]: h
Out[74]: <pytraj.hbonds.HbondAnalysisResult at 0x2aac478062b0>

In [75]: h.lifetime(cut=0.5)
Out[75]:
{'ARG16_O-TRP6_NE1-HE1': 0.9473684210526315,
 'ILE4_O-LYS8_N-H': 0.7631578947368421,
 'LEU7_O-GLY10_N-H': 0.8947368421052632,
 'TYR3_O-LEU7_N-H': 1.0}

TODO: add text

pytraj notebook in github

github is so cool. We can have our notebooks rendered directly here.

https://github.com/pytraj/pytraj/blob/master/note-books/index.ipynb

Q&A

Draft for Q&A (will be copied to ipython-notebook)

pytraj developers don't recommnd to use traj.xyz (as a numpy 3D array with shape=(n_frames, n_atoms, 3))?

Because Trajectory object in pytraj is a wrapper of C++ vector in cpptraj. Everytime calling traj.xyz method, a copy of 3D array is made. User can not inplace update coords of each frame or specific atoms. Instead using traj[index] to do this. However, traj.xyz is still available to conviently access read-only data.

traj['@CA'] = xyz # inplace update `xyz` coords of all CA atoms

So why not designing xyz as mdtraj with numpy? Because cpptraj already has more than few hundreds types of action/analysis combination (from things very simple like calculating distance, dihedral, ... to very smart 'autoimage' action). If we design like mdtraj, we need to copy data from Frame object (cpptraj) to numpy data. If we want to use cpptraj's action/analysis, we again need to copy data from numpy back to Frame object.

We DO have numpy-backend for simple math for traj

traj += traj2
traj -= traj2
traj /= traj2
traj *= traj2

traj.apply(lambda x : x + 100.) # `x` is `xyz` coords of each frame

info: TrajectoryIterator and memory

super memory saving with 'TrajectoryIterator` (and out-of-core calculation)

user is able to load as many trajectory files to TrajectoryIterator

In [12]: len(traj.filelist)
Out[12]: 28432

estimated memory (MB) if loading all of them into RAM

(> 36 GB of data).

In [15]: traj._estimated_MB
Out[15]: 36119.00856

The real memory

In [16]: traj
Out[16]:
<pytraj.TrajectoryIterator with 284330 frames: <Topology with 5293 atoms, 1704 residues, 
1692 mols, 5300 bonds, PBC with box type = ortho>>

In [16] used 0.0117 MiB RAM in 0.11s, peaked 0.00 MiB above current, total RAM usage 
271.99 MiB

performance: pytraj vs cpptraj

just want to test how much pytraj is slower than cpptraj.

info

In [18]: traj
Out[18]:
<pytraj.TrajectoryIterator with 1000 frames: 

<Topology with 17443 atoms, 5666 residues, 5634 mols, 17452 bonds, 
PBC with box type = truncoct>>

cpptraj speed:

In [16]: %%time
pt.datafiles.cpptraj_dry_run("""
parm ./myparm.top
trajin remd.x.000
autoimage
rms first @CA
matrix dist @CA""")
   ....:
CPU times: user 3.5 s, sys: 346 ms, total: 3.85 s
Wall time: 3.85 s

pytraj speed

In [17]: %time pt.matrix_analysis.distance_matrix(traj(autoimage=True, rmsfit=(0, '@CA')), 
                                                                                  '@CA')
CPU times: user 4.12 s, sys: 358 ms, total: 4.48 s
Wall time: 4.48 s

overall, it tooks about 3.5 s for cpptraj while 4.12 s for pytraj.

Now I only need to worry about memory leak in some parts of pytraj.

Note: both cpptraj and pytraj use out-of-core calculation. If all frames are loaded to memory, above calculation only took 2.86 s for pytraj (vs 3.5 s for cpptraj).

Rewrite FrameArray class to have 3D array-memoryview

xyz = np.asarray(farray.xyz)

xyz should be memory view of farray.xyz

memory error

wrong result

In [18]: from pytraj import dihedral_analysis as da
In [19]: da.calc_psi (traj, dtype='ndarray')

Right result

In [21] dslist = da.calc_psi (traj)
In [22] dslist.to_ndarray()

Reason: memory is freed to soon. :D

work with `parmed` to share internal data for Topology.

work with parmed to share internal data for Topology.

memory leak

In [148]: %memit traj.calc_matrix()[:]
peak memory: 379.74 MiB, increment: 0.85 MiB

In [149]: %memit traj.calc_matrix()[:]
peak memory: 380.65 MiB, increment: 0.90 MiB

In [150]: %memit traj.calc_matrix()[:]
peak memory: 381.54 MiB, increment: 0.87 MiB

In [151]: %memit traj.calc_matrix()[:]
peak memory: 382.46 MiB, increment: 0.91 MiB

In [152]: %memit traj.calc_matrix()[:]
peak memory: 383.37 MiB, increment: 0.89 MiB

basic tutorial: download pdb, calc_bfactors and plot

plot by residue

In [6]: pdb = pt.load_pdb_rcsb("1l2y")
In [7]: pt.calc_bfactors(pdb, '@CA', dtype='dataset').plot(True)
Out[7]: [<matplotlib.lines.Line2D at 0x2aaac6036978>]

segmentation fault

STATUS: FIXED

traj.search_hbonds()[0], object lifetime again. I am not sure I need to use 'dtype='dataset'' as default input or not. May be just "dtype='dict'" to return a dict of numpy array. Trouble would be solved.
But it's nice to understand about those stuff.

Note:

traj.search_hbonds(dtype='dict')['ASN_1@OD1-ASN_1@N-H3'] is ok.
traj.search_hbonds(dtype='dataframe')[0] is ok
traj.search_hbonds(dtype='ndarray')[0] is ok

basic tutorial: save traj with autoimage and rmsfit

In [16]: import pytraj as pt

In [17]: traj = pt.iterload(['./data/tz2.ortho.nc', ], './data/tz2.ortho.parm7')

In [18]: pt.write_traj("test.nc", traj(autoimage=True, rmsfit=(0, '@CA,N,O')), overwrite=True)

memoization

basic tutorial: auto_correlation_function, cross_correlation_function with plot

In [3]: import pytraj as pt
In [4]: pt.auto_correlation_function(data, dtype='dataset').plot(True)

basic tutorial: download pdb, rmsd then plot

In [1]: import pytraj as pt

In [2]: pdb = pt.load_pdb_rcsb("1l2y")

In [3]: pdb.rmsd(mask="@CA", dtype='dataset').plot(True)
Out[3]: [<matplotlib.lines.Line2D at 0x2aaac49bd438>]

change `common_actions` name to something else

currently most of pytraj's actions (calc_, do_, ..) are in this common_actions module. It's terrible name. Need to rename it.

Ideally, we can split to sub-actions like mdtraj, but cpptraj has > 100 actions. It's not easy to keep track.

So, for most commonly used ones (rmsd, radgyr, dihedral, angle, rdf, vector, ..) are in pytraj main namespace.

We also have `pytraj.

In [2]: pytraj.dihedral_analysis.
pytraj.dihedral_analysis.absolute_import           pytraj.dihedral_analysis.calc_omega
pytraj.dihedral_analysis.calc_alpha                pytraj.dihedral_analysis.calc_phi
pytraj.dihedral_analysis.calc_beta                 pytraj.dihedral_analysis.calc_psi
pytraj.dihedral_analysis.calc_chin                 pytraj.dihedral_analysis.calc_zeta
pytraj.dihedral_analysis.calc_chip                 pytraj.dihedral_analysis.g_dict
pytraj.dihedral_analysis.calc_delta                pytraj.dihedral_analysis.iteritems
pytraj.dihedral_analysis.calc_epsilon              pytraj.dihedral_analysis.my_func_str
pytraj.dihedral_analysis.calc_gamma                pytraj.dihedral_analysis.print_function
pytraj.dihedral_analysis.calc_multidihedral        pytraj.dihedral_analysis.supported_dihedral_types
pytraj.dihedral_analysis.calc_nu1                  pytraj.dihedral_analysis.template
pytraj.dihedral_analysis.calc_nu2

basic tutorial: load traj to memory with "autoimage, rmsfit and strip atoms"

Aim: load traj to memory with autoimage, then rms to first frame with @cb mask, then strip all atoms but @ca.

>>> import pytraj as pt
>>> traj = pt.iterload(your_big_traj_file, top_file)
>>> for frame in traj(mask='@CA', autoimage=True, rmsfit=(traj[0], '@CB')):
>>>    pass

is equal to cpptraj's command below

parm top_file
trajin your_big_traj_file
autoimage
rms first @CB
strip !@CA