Comments (20)
should I use indexing or frame_iter to get frame from Trajectory
Always try to use frame_iter for optimized speed.
Instead of doing
for i in range(1000):
traj[i]
SHOULD do
for frame in traj:
frame
or enumerate
for idx, frame in enumerate(traj):
frame
or
for frame in traj(10, 100000, 10): # equal to range(10, 100001, 10)
frame
Speed test
from pytraj.testing import make_fake_traj
# make a fake Trajectory with 1000 frames, 100K atoms
traj = make_fake_traj(1000, 100000)
print (traj)
@Timer()
def test_frame_iter_speed():
for frame in traj:
frame
n_frames = traj.n_frames
@Timer()
def test_frame_indexing_speed():
for i in range(n_frames):
traj[i]
print ("test_frame_iter_speed")
test_frame_iter_speed() # only 0.00012 s in my computer
print ("test_frame_indexing_speed")
test_frame_indexing_speed() # 0.145 s (> 1000 times slower)
from pytraj.
Should I use Trajectory
class or TrajectoryIterator
?
if memory is not a big problem, you should use Trajectory
class, which hold a chunks of frames in memory. Things are much faster (like frame iterator). However, if the trajectory file is too big, it's highly recommended to use TrajectoryIterator
to save memory. You still can use fancy indexing for this iterator (either returning a Frame
or a mutable Trajectory
object).
There are many way to construct Trajectory
and TrajectoryIterator
. But the esiest way to remember is to use pytraj.io
module with load
(return Trajectory
) and iterload
(return TrajectoryIterator
) methods.
(See also the dicussion about those classes here) (#262)
from pytraj.
What is the different between for frame in traj[1:11:2]
and for frame in traj(1, 10, 2)
, where "traj" is a TrajectoryIterator
object?
Per memory efficiency, traj(1, 10, 2) is better. This is shortcut of traj.frame_iter(start=1, stop=10, stride=2)
(which is also similiar to cpptraj's command: trajin my_trajectory.nc 2 11 2
. When calling traj(1, 10, 2)
, only one single Frame is loaded into memory.
print (traj(1, 10, 2))
<generator object frame_iter at 0x2aaac8f7daf8>
In constrast, calling traj[1:11:2]
, pytraj
does the slicing first to create in-memory Trajectory
object, and then iterate frame.
Example
print (traj[1:11:2])
<pytraj.Trajectory with 5 frames: <Topology with 1692 mols, 1704 residues,
5293 atoms, 5300 bonds, PBC with box type = ortho>>
See also here for speed comparison.
from pytraj.
pytraj has operator traj += 1.
but why does it not have traj0 = traj1 + 1.
?
Answer: Because of memory saving. Just think about handling 2G Trajectory in memory. It's disaster if you call traj = 2 * traj + 3 * traj
traj += 1.
is in place operator, no extra copy for traj
.
traj0 = traj1 + 1.
is equal to traj0 = traj1.copy(); traj0 += 1.
If you want to have more complicated math + memory saving, you should use apply
method.
traj.apply(lambda x : 2 * x + 1.)
This is equal to
traj *= 2.
traj += 1.
from pytraj.
What's advantage of using "traj.apply" rather than using "numpy"?
traj.apply(lambda x : x * 2. + 1.) # where `x` is `xyz` coords of each frame
# vs
xyz = xyz * 2 + 1 # xyz is 3D-numpy array (shape=(n_frames, n_atoms, 3)
Answer
- memory efficient
traj.apply
perform calculation for eachframe
. In worst case, only a copy ofxyz
coords for each Frame is copied. With numpy, whole traj's xyz is copied. - speed: faster for very large system (at least 3 times faster for my tested system (500 frames, 169376 atoms)).
from pytraj.testing import make_fake_traj, Timer, duplicate_traj, aa_eq
import pytraj.io as io
from pytraj.misc import merge_trajs
# making fake large traj
traj = io.load_sample_data('tz2')[:]
for i in range(5):
traj = merge_trajs(traj.copy(), traj.copy())
traj = duplicate_traj(traj, 50)
xyz = traj.xyz[:].copy()
print (traj)
# end of making fake traj
@Timer()
def test_apply(traj):
traj.apply(lambda x : 2 * x + 1)
@Timer()
def test_apply_similiar_numpy(xyz):
xyz = xyz * 2. + 1.
test_apply(traj)
test_apply_similiar_numpy(xyz)
Note: to avoid heavy copy for numpy array, use in-place operator xyz *= 2.; xyz += 1.
** Note **: speed might be different depending on available memory.
from pytraj.
I still want to use mutable xyz
coords of traj, should I move on to another package?
Answer
If you still want to use numpy extensively with mutable xyz
coords, you can still use pytraj.api.Trajectory
class. The advantage of using this class is just like the one from mdtraj
, which you can perform all numpy methods with xyz
without making extra data copy. However, in order to use cpptraj
's actions, pytraj
will internally convert to frame
(extra copying). Besides copying issue, you can fully use all cpptraj's actions + strength of numpy
. Note that fancy indexing like 'traj['@ca'] is not implemented for this class.
It's trivial to convert pytraj's Trajectory and TrajectoryIterator class to this api.Trajectory
class.
In [33]: from pytraj import io
In [34]: traj = io.load_sample_data('tz2')
In [35]: traj
Out[35]:
<pytraj.TrajectoryIterator with 10 frames: <Topology with 1692 mols,
1704 residues, 5293 atoms, 5300 bonds, PBC with box type = ortho>>
In [36]: from pytraj import api
In [37]: t = api.Trajectory(traj)
In [38]: t
Out[38]: Trajectory with 10 frames, 5293 atoms
# you can mutate `xyz` coords by
In [39]: t.xyz[0, 0]
Out[39]: array([ 15.55458927, 28.54844856, 17.18908691])
In [40]: t.xyz[0, 0] += 1.
In [41]: t.xyz[0, 0]
Out[41]: array([ 16.55458927, 29.54844856, 18.18908691])
# you can even use this class with `mdtraj` package (for some analyses such as rmsd)
In [43]: md.rmsd(t, t, 0)
Out[43]:
array([ 0. , 8.28397369, 9.8498888 , 11.27068233,
11.86576557, 12.52786446, 13.33117199, 13.53176975,
13.80132675, 14.00614452], dtype=float32)
# let's compare to pytraj
In [50]: t.rmsd(0)
Out[50]: array('d', [0.0, 8.283967389538908, 9.849893186468968, 11.2706769011418,
11.865759799173082, 12.527866218740536, 13.331165639093559, 13.53176951142984,
13.801334666412323, 14.00614642923093])
from pytraj.
it's too slow to convert Frame.xyz to numpy array, about 3 us for only 10 atom-Frame.
Answer
It's constant time to cast to numpy memoryview. It takes similar time to cast from 1 million-atom Frame too. This is not problem since you only need cast once without making extra data copying.
cat timing.py
from pytraj.testing import make_random_frame
import numpy as np
from timeit import timeit
def test_time(n_atoms):
f = make_random_frame(n_atoms)
def test():
np.asarray(f)
print (timeit(test, number=1000000))
for n_atoms in [10, 100, 1000, 100000, 1000000]:
test_time(n_atoms)
Casting to numpy array-view for 1 million frames only take 6 s
.
python timing.py
6.04225823841989
6.05758654139936
6.077507931739092
6.064353432506323
6.1148266065865755
from pytraj.
why fancy indexing in pytraj
for Trajectory does not follow numpy
indexing?
In [9]: traj['@CA', 5:7, 0]
Out[9]: <Frame with 20 atoms>
In [10]: traj[:, :, 0]
Out[10]: <Frame with 304 atoms>
In [11]: xyz = traj.xyz
In [12]: xyz[:, :, 0]
Out[12]:
array([[-16.492, -15.656, -17.148, ..., 20.432, 21.025, 20.795],
[ 8.094, 7.667, 8.414, ..., 0.818, 0.859, 0.977],
[ 1.759, 1.225, 1.248, ..., 1.33 , 2.35 , 1.074],
...,
[ 8.674, 9.412, 7.767, ..., 0.167, 1.399, -0.574],
[ -6.327, -6.377, -7.054, ..., 2.357, 1.394, 2.156],
[ -6.128, -7.091, -5.744, ..., -8.726, -8.979, -7.769]])
Answer
- because
Trajectory
is not a numpy array. You should usexyz
attribute to get raw array (Note: expensive to do this withTrajectoryIerator
) pytraj
implements very simple fancy indexing scheme. For example,traj[1:3, :, 0]
is equal totraj[1:3][:][0]
# from left to right.
In [13]: traj[1:3, :, 0]
Out[13]: <Frame with 304 atoms>
In [14]: traj[1:3][:][ 0]
Out[14]: <Frame with 304 atoms>
from pytraj.
so why does frame['@ca'] NOT return a new Frame object?
Answer
I actually don't know which one is more appropriate, whether frame['@ca'] should return a new Frame or should return xyz
coords.
Current implementation of pytraj
return coords for frame[mask] or frame[AtomMask_object]
In [1]: from pytraj import io
In [2]: f = io.load_sample_data ()[0]
In [3]: traj = io.load_sample_data ()
In [4]: f = traj[0]
In [5]: f
Out[5]: <Frame with 34 atoms>
In [6]: f.set_top(traj.top)
In [7]: f['@CA']
Out[7]:
array([[ 3.97004800e+00, 2.84579500e+00, -1.00000000e-07],
[ 7.64000760e+00, 3.83858370e+00, -7.40000000e-06],
[ 1.01610562e+01, 6.68437820e+00, 1.20000000e-06]])
# fancy indexing as a numpy array
In [10]: f['@CA', :, 2] # z-coord for all atoms
Out[10]: array([ -1.00000000e-07, -7.40000000e-06, 1.20000000e-06])
from pytraj.
get wrong file size when importing pytraj' module
Answer
It's likely that you made changes to pxd
file extension. The best way to do this is
- remove all cythonzied
*.cpp
files bypython clean_cythonize_files.py
(in the root folder) python setup.py install
again (require havingCython
)
Note: To accelerate the compiling, you can use multiprocessing
, just
python ./setup.py build -faster_build
- then
python setup.py install
- (if you don't see the speeding up with
-faster_builid
, just cancel (Ctrl-C
) and run again).
from pytraj.
turn on cpptraj' log
from pytraj import set_world_silent
set_world_silent(False)
from pytraj.
got "undefined symbol" error
Answer you might have several "libcpptraj.so" files. Need to export/set "LD_LIBRARY_PATH" correctly.
(Note: not done this Q&A yet)
from pytraj.
why not directly use pandas
or toolz
?
Because pytraj
only need several methods. So we either try to reinvent the wheel
or shamelessly copy a small part of their code (all have BSD-like license).
pandas
is great package but its API is very complicated with lots of methods and it aims broad users. pytraj
only focus on analyzing data from MD simulation.
from pytraj.
list comprehensions give identical frames when iterating TrajectoryIterator
updated: fixed. no identical frames anymore.
old answer
Answer
To save memory, only a single Frame will be loaded at a time when iterating TrajectoryIterator. So
[frame for frame in traj]
is just a list of frame pointers pointing to the same memory block. That's why you got identical frames.
To get different frames, use copy
: [frame.copy() for frame in traj]
. This should be carefully done since you might end up loading tens of Gb to memory.
from pytraj.
undefined symbol: _ZN14MaskTokenArray13SelectedChar_E
Answer
try again:
conda install libcpptraj --force -c pytraj
conda install pytraj-dev --force -c pytraj
and test: python -c 'import pytraj as pt; pt.run_tests()'
from pytraj.
Error: Could not determine trajectory format ".nc"
Answer Need to install libcpptraj
with libnetcdf
.
If you are using conda
, you can try conda install libcpptraj --force -c pytraj
from pytraj.
why pytraj.load
supports loading from url but pytraj.iterload
does not?
Answer: pytraj.load
load data to memory, so it's easy to load from url. pytraj.iterload
is lazy loading, a small chunk of data will be loaded to memory if ONLY needed. So pytraj.iterload
need absolute dir of trajectory file; loading from url create a file in /tmp folder.
In [4]: pt.load(url)
Out[4]:
<pytraj.Trajectory with 1 frames:
<Topology with 30 atoms, 1 residues, 1 mols, 32 bonds, non-PBC>>
In [5]: pt.iterload(url)
Error: File 'http://ambermd.org/tutorials/basic/tutorial4/files/sustiva.mol2'
does not exist.
from pytraj.
ImportError: libhdf5_hl.so.9: cannot open shared object file: No such file or directory
Answer: try conda install libnetcdf
updated: man, this is conda issue
from pytraj.
ValueError: Big-endian buffer not supported on little-endian compiler
Answer wrong dtype
from pytraj.
close. we have website now.
from pytraj.
Related Issues (20)
- Pytraj nastruct needs to be updated to handle [nxyz] data HOT 9
- wrong major groove calculation HOT 8
- calculated basepair matches in pytraj HOT 2
- Installation error of pytraj with pip HOT 4
- energy_decompostion throws "Fortran runtime error: Bad value during integer read" HOT 1
- Has anyone tried a Python 3.11 conda package build yet? HOT 11
- Spurious Test Failures with Cpptraj HOT 9
- ModuleNotFoundError: No module named 'pytraj.core.topology_objects' HOT 15
- Help: error building pytraj native library when make install 91% HOT 10
- Function to find the atoms of a given mask that appear in a region of xyz coordinate space? HOT 10
- hbond pmap_mpi output format HOT 7
- Make binary wheel for pytraj? HOT 5
- pytraj.plot module not found HOT 2
- Reporting Issues with Pytraj using tutorials HOT 28
- pytraj segmentation fault with cython 3.0 and cpptraj commit f1d21f688de5ea5c55e650abe6efdbc352bdc613 HOT 1
- pmap parallel error HOT 6
- How do I import pytraj? HOT 4
- nh_order_parameters
- pytraj not compatible with OSX-arm64 arch
- Trying to pip install pytraj HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytraj.