Comments (31)
Hi,
this is because of historical issue. :D I started to wrap Trajin_Single in cpptraj's first (TrajReadOnly). But latter I need to have another class to hold a chunk of data in memory, so I created FrameArray (similar to FrameArray in cpptraj but pytraj has more control).
The only reason I want to expose TrajReadOnly in pytraj
because it's friendly to memory. when calling load
method in TrajReadOnly
, only general checking is done (if file exist, n_frames, ..) but the data is not loaded to memory. User need to iterate frame by frame to get the data. This approach is very similar to MDAnalyis
program.
While playing with TrajReadOnly, I wanted to slice it, like traj[0:20:2] to get a chunk of frames. There
is no way I can store new frames in TrajReadOnly, so FrameArray was born to do this job. Whenever calling FrameArray(traj_name, top_name), all data will be loaded into memory. In my experience, this is good for small size traj (like the ones people store in h5 file) but quite slow to load for large data. For example this is loading time for 17K atoms, 1000 frames, netcdf
# load whole traj into memory by FrameArray in pytraj
# I don't use %timeit because it's terriblly slow in my computer
%time io.load(filename, top_name)[:] # FrameArray
# using cpptraj's class (Trajin_Single)
%time io.load(filename, top_name) # TrajReadOnly
%time md.load_netcdf(filename, top=top_name) # mdtra's Trajectory
# pytraj fa: -1 (but not that slow)
CPU times: user 2.09 s, sys: 449 ms, total: 2.54 s # FrameArray
Wall time: 2.54 s
CPU times: user 191 ms, sys: 14 ms, total: 205 ms # TrajReadOnly
Wall time: 205 ms
CPU times: user 1.71 s, sys: 330 ms, total: 2.04 s # mdtraj
Wall time: 2.24 s
you can see that FrameArray and mdtraj have similar loading time (1.7-2.1 s), which is much slower
than TrajReadOnly (191 ms). This is the good thing about TrajReadOnly. I've playing with pytraj
a lot to test it's functionality so sometimes I am not really patient to wait for 2 s for just testing something simple, TrajReadOnly can offer this job.
For FrameArray, I also like its offers: iterating over frames is really fast http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/speed_test_2_trajs.ipynb
iterating and slicing is fast because no data is copied (not like mdtraj
, if I know correctly). All you do is just moving pointer over C++ vector of Frames. Whenever slicing FrameArray, like fa[0:10:20], we only create a new 'view' of the original traj (which mean updating new resulted FrameArray will change coords of original array; we can explicitly use copy
method though). Even with fa[0] will result only a 'view' of Frame object in fa
.
Why not an object like Trajectory
in mdtraj
?
Because I want to keep all methods and information for Frame object. cpptraj have Frame class and pytraj
wrap it with the same name. Creating a vector of Frames (FrameArray) is easiest way to have all C++ Frames objects without copying data to external object and can keep all Frames' method too.
For example, I am experimenting api.Trajectory
, which is really similar to mdtraj
(just a wrapper of numpy
3D-array). https://github.com/hainm/pytraj/blob/master/pytraj/api.py
However, whenever want to use cpptraj
actions/analyeses, I need to convert api.Trajectory
to something cpptraj
can understand (Frame object). This is quite expensive because of data copying.
So in summary, TrajReadOnly
is like Universe
object in MDAnalysis
(in term of need frame iterating to do analysis) and FrameArray
is like Trajectory
object in mdtraj
(in term of loading a chunk of data into memory). Each has pros and cons that I am still considering.
(I noted here: http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/speed_test_2_trajs.ipynb). I've been playing with two other packages (MDAnalysis
and mdtraj
) to find their good and not-good things to make pytraj
better.
But you're right that simple API is much more robust than trying to make things very complicated. I will think more about designing it. ( and actually pytraj
is much more friendly and efficient than the one I first introduced to you and Dan. :D ; all suggestions were/are always taken).
thanks.
Hai
from pytraj.
btw, I created index
file for all notebooks I wrote to keep track myself and for others too. You might want to have a look at it.
http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/index.ipynb
PS: matplotlib
is not very easy to use in my opinion. I used to spend hours to just need to find out how to have 4 subplots in the same figures and have nice layout (the most trouble is find out how not to overlap x, y axis labels and make them look nice). It has too many options to choose (rather making default one robust). At that time, I just use my MS ppt slide to organize the figure layout :D.
(for example. even with their official website, the demo plot looks terrible
http://matplotlib.org/examples/statistics/histogram_demo_multihist.html)
Hai
from pytraj.
ah, I really don't like FrameArray and TrajRreadOny's names. :D (Trajectory
seems to be better)
from pytraj.
oh, two more points about TrajReadOnly
- suppose you have very large trajectory (few hundreds of GB data), or you have a long list (tens of ) of large trajectories, use
TrajReadOnly
is more efficient for iterative exploring inipython notebook
.
you simply just call: traj = io.load(your_traj_name, your_top_name)
. then you are free to get the very end frame, like frame = traj[1000000]
. If you want to slice the last chunk, just traj[:1000:2000]
and you will get in-memory FrameArray
object.
To the best of my limited knowledge about mdtraj
, you need to use md.iterload
to load a chunk of data. What's about getting a random chunk? call iterload
again and use if else
statement?.
What's about loading few frames? mdtraj
can use md.load(..., indices=...). But for pytraj
, you just traj[indices]
rather calling load
again but still not need to load TB of data to disk.
frame iterator? it's simple: for frame in traj(start, stop, stride, mask, autoimage=True)
chunk iterator? it's simple: for chunk in traj(start=10, chunk=50)
.
The main idea is to use both frame iterator
, chunk iterator
and indexing
with minimal effort.
cpptraj
andpytraj
handleremd
trajectory very well.
traj = io.load_remd(filename, top_name, T=300.0)
# note: data is not yet loaded to memory
whenever slicing traj[index]
, cpptraj will loop all remd.x.$ext
to find a frame with T=300.0. You know for sure that loading remd data to memory is not really a good idea if running stuff in TIP3P, 64 replicas. I think this is very cool for interactive work in ipython (notebook too).
Hai
from pytraj.
OK, so correct me if I'm wrong, but it seems to me that the advantage of TrajReadOnly
is in trajectory iteration.
Because I want to keep all methods and information for Frame object. cpptraj have Frame class and pytraj wrap it with the same name. Creating a vector of Frames (FrameArray) is easiest way to have all C++ Frames objects without copying data to external object and can keep all Frames' method too.
Do you get a numpy
array if you get, for instance, the xyz
attribute of FrameArray
? If not, does it implement the array
API? (For example, pandas.Series
is no longer a numpy.ndarray
subclass, but still implements the array
API so it can be used in place of numpy
arrays in most cases by virtue of duck typing, as I understand it.) Also, numpy is quite efficient with memory -- unlike standard Python containers, slicing a numpy array does not copy any data -- it simply exposes a view of the same data. Which is why when you assign to a numpy slice, it changes the underlying object:
>>> import numpy as np
>>> arr = np.arange(10)
>>> arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> arr[2:8] = 0
>>> arr
array([0, 1, 0, 0, 0, 0, 0, 0, 8, 9])
Since mdtraj
uses raw numpy.ndarray
objects for its trajectories, all slices are views (as I hope they are on FrameArray
)
Anyway, I digress. Let me suggest something for pytraj here:
TrajReadOnly
becomesTrajectoryIterator
or something (to let people know it's an optimized iterator)FrameArray
becomesTrajectory
or something, which is the main "object" for storing trajectory datapytraj.load()
returns aFrameArray
(or whatever you name it)pytraj.iterload()
returns aTrajReadOnly
(or whatever you name it)
This is more consistent with Python's data model, IMO (an iterator -- like a generator -- cannot be assigned to because there's nothing to assign to, since it's not in memory). Consider:
>>> iterable = [x for x in range(10)]
>>> iterable[8] = 0
>>> iterable
[0, 1, 2, 3, 4, 5, 6, 7, 0, 9]
>>> iterable = (x for x in range(10))
>>> iterable[8] = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object does not support item assignment
The way you've described it to me, this is the difference between FrameArray
(top) and TrajReadOnly
(bottom) -- and Python supports both because there is a clear use case.
PS: matplotlib is not very easy to use in my opinion. I used to spend hours to just need to find out how to have 4 subplots in the same figures and have nice layout (the most trouble is find out how not to overlap x, y axis labels and make them look nice). It has too many options to choose (rather making default one robust). At that time, I just use my MS ppt slide to organize the figure layout :D.
It's easy to get started. Publication-quality plots are more challenging, but you can uncover the layers of complexity in matplotlib as you need them, and through it all their object model is consistent (Artists, Patches, Line2Ds, Axes, etc.). A Line2D
object is used for the axis lines, the tick lines, the error-bar lines, the error bar hats, the plot lines, ... every line on the plot that you see is an instance of the same object, so when you know how to modify one, you can modify them all. And so on. The combination of flexibility and a complicated target application (general "plotting") means there's no way to avoid complexity in the API, but they mitigate that IMO with a minimal core object model.
As for the layout -- you are looking for the tight_layout
method :). (http://matplotlib.org/users/tight_layout_guide.html)
from pytraj.
A little clarification -- TrajReadOnly
is better than just a pure generator -- it's more like the range
object in Python (very cool!).
Consider:
Python 3.3.5 (default, Aug 24 2014, 10:02:17)
[GCC 4.7.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> range(10)
range(0, 10)
>>> range(10)[5]
5
from pytraj.
Compare that to the much more limited pure generator:
Python 3.3.5 (default, Aug 24 2014, 10:02:17)
[GCC 4.7.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> (x for x in range(10))
<generator object <genexpr> at 0x7f0e9bdeacd0>
>>> (x for x in range(10))[5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'generator' object is not subscriptable
So it's a subscriptable generator -- I would still stress this as an iterator. Calling it TrajReadOnly
detracts from how really cool it is, and why and when you should use it.
from pytraj.
@swails: you know that I am really bad at naming stuff. For example, I used to have a method called
load_to_ParmEd_object
, then I changed to _load_chem
.
TrajReadOnly
is not only an iterator. You can still iterate it, or slice it. :D. But definitely I need to rename it. Its current's name look ugly to see in my eye.
Hai
from pytraj.
The name should describe what it's good for. TrajReadOnly
is good in the way that xrange
is good. It's a subscriptable iterator (subscription indicates that it implements __getitem__
to retrieve a single entry or a slice). Its name should reflect that, IMO.
I like load_to_ParmEd_object
better than _load_chem
(or load_parmed
or something more descriptive). load_chem
is non-descriptive to the point of obfuscated. For private methods this doesn't matter as much, but in 6 months you'll know what load_parmed
does quicker than load_chem
;).
from pytraj.
OK, so correct me if I'm wrong, but it seems to me that the advantage of TrajReadOnly is in trajectory iteration.
Yes. (python3 use lots of iterator for memory reason :D, so I follow this)
Do you get a numpy array if you get, for instance, the xyz attribute of FrameArray?
Yes. framearray.xyz
and trajreadonly.xyx
will return a copy
of its coords. And xyz
is a numpy array. To get xyx
, I need to iterate all frames and copy data to numpy array. https://github.com/hainm/pytraj/blob/master/pytraj/FrameArray.pyx#L283
It's a copy
because the memory layout is different (C++ vector of Frames does not have continuous chunk of memory for xyz
coords). However, for single Frame, calling xyz
will return a numpy array as a view
of frame.xyz (thanks Dan to keep frame.xAddress() to return a double* pointer).
you can look here and find that cython
is so cool https://github.com/hainm/pytraj/blob/master/pytraj/Frame.pyx#L309
Also, numpy is quite efficient with memory -- unlike standard Python containers, slicing a numpy array does not copy any data -- it simply exposes a view of the same data.
this only happen if you don't break the "contiguous" things for memory. Please check this
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing
from pytraj.
Since mdtraj uses raw numpy.ndarray objects for its trajectories, all slices are views (as I hope they are on FrameArray)
I am experiencing api.Trajectory
to use numpy xyz
too. The slicing of api.Trajectory
is 10000 times faster than mdtraj
and iterating it is superfast too. check link here for slicing and iterating section http://nbviewer.ipython.org/github/pytraj/pytraj/blob/master/note-books/speed_test_2_trajs.ipynb
This is very promising. However I still need to know what I really need. cpptraj
already have hundreds of kinds of analysis (great thanks to all developers). So if just use FrameArray
or TrajReadOnly
is more efficient since I don't need to copy back and forth xyz
from numpy to cpptraj`' Frame object.
And honesty I don't know what's advantage of using numpy's xyz
like the one in mdtraj
. if user need raw data? just xyz = traj.xyz
. I think most of the time user just need to play with read-only data to do analysis rather changing xyz
.
To update xyzy
for a specific Frame from numpy
array? it's easy too
fa[0].xyz[:] = numpy_xyz_array # fa is `FrameArray`
Another point, since mdtraj
use xyz
directly for some of calculation, like rmsd
, you can use pytraj
' objects directly in mdtraj
without converting too.
import mdtraj as md
md.rmsd(fa, fa, 0) # fa is `FrameArray` object.
all slices are views (as I hope they are on FrameArray)
I love to have this too but it's impossible with current implementation of cpptraj. As I said before, the memory layout is different.
from pytraj.
Anyway, I digress. Let me suggest something for pytraj here:
TrajReadOnly becomes TrajectoryIterator or something (to let people know it's an optimized iterator)
great. noted.
FrameArray becomes Trajectory or something, which is the main "object" for storing trajectory data
Yes, noted.
pytraj.load() returns a FrameArray (or whatever you name it)
pytraj.iterload() returns a TrajReadOnly (or whatever you name it)
Noted. But what's about we can index TrajectoryIterator
too? (TrajectoryIterator[0]. TrajectoryIterator[100:1000]). or this is ok (like the example you show me)
overall, I think this is great suggestion (as pytraj
name). I think I am willing to break my design now with the new name (pytraj is still in dev-status anyway)
Hai
from pytraj.
The name should describe what it's good for. TrajReadOnly is good in the way that xrange is good. It's a subscriptable iterator (subscription indicates that it implements getitem to retrieve a single entry or a slice). Its name should reflect that, IMO.
I see. Great.
I like load_to_ParmEd_object better than _load_chem (or load_parmed or something more descriptive). load_chem is non-descriptive to the point of obfuscated. For private methods this doesn't matter as much, but in 6 months you'll know what load_parmed does quicker than load_chem ;).
_load_chem
is for my personal use in testing. _load_chem(parm_name)
will return chemistry
object. pytraj
has official io.load_ParmEd
to convert ParmEd
object to pytraj
object.
I made _load_chem
because I am too lazy to type
import chemistry as chem
parm = chem.load_file(parm_name)
while in pytraj
, I just io._load_chem(parm_name)
(note: io
is already loaded from my very early comment, so I don't need to call import pytraj.io as io
again). :D
from pytraj.
Since mdtraj uses raw numpy.ndarray objects for its trajectories, all slices are views (as I hope they are on FrameArray)
I just realize that slicing FrameArray
will get a view too. But in this case, we are not slicing xyz
, we slice vector
. Check this code (when slicing FrameArray, a pointer
is returned)
https://github.com/hainm/pytraj/blob/master/pytraj/FrameArray.pyx#L428
(but of-course, it's impossible to slice and strip atoms at the same time to get a view
). For example
fa['@CA'] # return a copy since we slice and strip all but @CA atoms
from pytraj.
I can wait to change the names, so just push few hundreds of files to pytraj/pytraj :D
thanks for your suggestion.
Hai
from pytraj.
oops, I'just check and it turned out that slicing FrameArray (fa[:3]) will return a copy. :D
(I guess c++ vector make a copy when using v.push_back)
for a single Frame, fa[0] will return a 'view
of Frame.
from pytraj.
And honesty I don't know what's advantage of using numpy's xyz like the one in mdtraj. if user need raw data? just xyz = traj.xyz. I think most of the time user just need to play with read-only data to do analysis rather chaning xyz.
I disagree. A large number of analyses require RMSD alignment, centering and imaging, or just stripping out the solvent altogether. If I recall details of previous git log messages (they've become too numerous for me to continue following them), this can't be done with a TrajReadOnly
as they're immutable. Most common tasks require this, too -- like PCA, calculating RMSDs, B-factor (atomicfluct), average structure calculations, ... etc.
I can wait to change the names, so just push few hundreds of files to pytraj/pytraj :D
thanks for your suggestion.
FWIW, git is unique in that it separates "content" (i.e., all of the lines of code, git calls them 'blobs') from the files that they're in ("tree" information). So renaming, moving, copying, etc. different files will not actually weigh down a git repository, since a lot of the content does not change.
oops, I'just check and it turned out that slicing FrameArray (fa[:3]) will return a copy. :D
(I guess c++ vector make a copy when using v.push_back)
I'm not sure how pytraj is using C++ behind-the-scenes, but if you pass a std::vector
itself (rather than a reference to the vector), then I believe it's the equivalent of doing a "pass-by-value", and a new std::vector
will get initialized using the copy constructor, which duplicates the vector contents. If you pass by reference, obviously this doesn't happen. But I don't know enough about C++ to know what you need to do here to enable it to return a view rather than a copy (but @Mojyt might).
As far as stripping, even mdtraj
creates a copy of the trajectories in this case.
from pytraj.
And honesty I don't know what's advantage of using numpy's xyz like the one in mdtraj. if user need raw data? just xyz = traj.xyz. I think most of the time user just need to play with read-only data to do analysis rather chaning xyz.
I disagree. A large number of analyses require RMSD alignment, centering and imaging, or just stripping out the solvent altogether. If I recall details of previous git log messages (they've become too numerous for me to continue following them), this can't be done with a TrajReadOnly as they're immutable. Most common tasks require this, too -- like PCA, calculating RMSDs, B-factor (atomicfluct), average structure calculations, ... etc.
Hi, I should be more clearer (due to my English). I meant pytraj
does not really need xyz
behave like the one in mdtraj
(of course, xyz
numpy is the center of mdtraj
's design, so it's very important). What I meant here is cpptraj
already has tons of actions (I am still simplifying them to be more Pythonic), so I am not sure why pytraj
really need xyz numpy.
A large number of analyses require RMSD alignment, centering and imaging, or just stripping out the solvent altogether.
Yes, I agree with this. But cpptraj
already have all of them and I implemented for Trajectory
too (new name of FrameArray
) (traj.fit_to(ref), traj.autoimage, ...
)
, this can't be done with a TrajReadOnly as they're immutable
I think you're thinking about mdtraj
approach when mentioning this, which mean user might want to use numpy
to update coordinates for a bunch of frames at the same times (thanks to numpy vectorization). With TrajectoryIterator
, user need to iterate to get Frame and modify xyz
for each Frame (not really fancy like numpy
but is more memory efficient for large data).
for frame in TrajectoryIterator():
frame.xyz[:] = some_thing_cool # note: `xyz` here is a numpy array view of frame coords
This approach is the same as cpptraj
and MDAnalysis
. But pytraj
does have mutable Trajectory
so we can in-place update coordinates
traj.autoimage()
traj.strip_atoms('!CA') # inplace strip atoms. this is different from traj['@CA']
traj.fit_to(ref)
an any cpptraj
actions that changing coords applying to traj
will inplace-update its coords too.
For example
In [57]: traj = io.load_sample_data ()[:]
In [58]: traj[0, 0]
Out[58]: array([ 3.32577000e+00, 1.54790900e+00, -1.60000000e-06])
In [59]: for frame in traj: frame[0] = [2., 3., 4.]
In [60]: traj[0, 0] # traj[0, 0] was updated when we update `frame`
Out[60]: array([ 2., 3., 4.])
from pytraj.
FWIW, git is unique in that it separates "content" (i.e., all of the lines of code, git calls them 'blobs') from the files that they're in ("tree" information). So renaming, moving, copying, etc. different files will not actually weigh down a git repository, since a lot of the content does not change.
pytraj
ship python files with cythonized files so things is a bit more complicated here. When I rename "FrameArray" to Trajectory
, I need to update its name in some cython header files (just like C++ header). This force me to recompile whole module and Cython added/removed so many lines to files.
But for other things, you're right. :P
As far as stripping, even mdtraj creates a copy of the trajectories in this case.
No, pytraj
use inplace-stripping (traj.strip_atoms('@ca')). The another (traj['@ca']) is slicing to return a new Trajectory.
from pytraj.
I'm not sure how pytraj is using C++ behind-the-scenes, but if you pass a std::vector itself (rather than a reference to the vector), then I believe it's the equivalent of doing a "pass-by-value", and a new std::vector will get initialized using the copy constructor, which duplicates the vector contents. If you pass by reference, obviously this doesn't happen. But I don't know enough about C++ to know what you need to do here to enable it to return a view rather than a copy (but @Mojyt might).
Yes. in pytraj
the Trajectory
is a wrapper of C++ vector of cpptraj' Frame. So whenever I use v.push_back(frame), vector will create a copy of frame. I did try design differently to use a vector of Frame pointers, from which I can just push_back
the pointers to vector. Using pointer is very dangerous and I got segmentation faul all the time since C++ and Python deallocated the memory (in this case) at the same time. So I just gave up and pleased with current design in pytraj
.
But I might be wrong since my understanding about C++ is limited too.
from pytraj.
@swails
FYI in case you don't know yet
http://stanford.edu/~mwaskom/software/seaborn/
from pytraj.
On Wed, May 6, 2015 at 9:04 PM, Hai Nguyen [email protected] wrote:
I'm not sure how pytraj is using C++ behind-the-scenes, but if you pass a
std::vector itself (rather than a reference to the vector), then I believe
it's the equivalent of doing a "pass-by-value", and a new std::vector will
get initialized using the copy constructor, which duplicates the vector
contents. If you pass by reference, obviously this doesn't happen. But I
don't know enough about C++ to know what you need to do here to enable it
to return a view rather than a copy (but @Mojyt https://github.com/mojyt
might).Yes. in pytraj the Trajectory is a wrapper of C++ vector of cpptraj'
Frame. So whenever I use v.push_back(frame), vector will create a copy of
frame. I did try design differently to use a vector of Frame pointers, from
which I can just push_back the pointers to vector. Using pointer is very
dangerous and I got segmentation faul all the time since C++ and Python
deallocated the memory in this case at the same time. So I just gave up and
pleased with current design in pytraj.
Jason is right about the vector passing in C++. This creates a copy:
MyFunction(std::vector myvec) {}
While this will just pass the reference with no copy:
MyFunction(std::vector const& myvec) {}
The segfault seems odd to me. I'm not 100% clear on how pytraj is sitting
on top of cpptraj, but if you create a vector of pointers to frames that
exist in cpptraj there shouldn't be any double deallocation. I'm guessing
what happens is that the owner of the Frames (underlying cpptraj) isn't
always present, and when it goes out of scope the destructors for the
Frames are called, leaving you pointing to freed memory. If the current way
is not efficient we can talk about what I can do with cpptraj to make it
more efficient.
-Dan
Daniel R. Roe, PhD
Department of Medicinal Chemistry
University of Utah
30 South 2000 East, Room 307
Salt Lake City, UT 84112-5820
http://home.chpc.utah.edu/~cheatham/
(801) 587-9652
(801) 585-6208 (Fax)
from pytraj.
what I meant is
vector<Frame> v
v.push_back(frame) # --> does `v` make a copy of `frame`?
@Mojyt: I am pleased with the current implementation of cpptraj
. But if there's anything we can make better cpptraj + pytraj
, I will discuss with you for sure.
from pytraj.
It's exactly the same concept -- you are passing by value rather than reference (since you declared the std::vector
template to be of type Frame
, rather than Frame*
or Frame&
)
So in response: yes, that makes a copy.
from pytraj.
ah, I see. I will try to see if I can use Frame&
in cython for pytraj
.
Hai
from pytraj.
after trial and error + google + stackoverflow, It's likely that push_back
always make a copy of object (even passing by reference) :D
http://stackoverflow.com/questions/11762474/c-stl-vector-push-back-taking-reference
If I understand correctly, if using vector<Frame>
and doing v.push_back(frame)
, the frame
object is copied twice (to pass as a variable to push_back
and to make a copy for vector container). If using vector<Frame&>
, the frame
object is only copied once (to make a copy for vector container).
this is my idea about slicing Trajectory to give a view
vector<Frame&> v_frame1, v_frame2
// create v_frame1
//try to push to v_frame2
v_frame2.push_back(v_frame1[0])
//update coords in v_frame2[0] and expect this will update coords i v_frame1[0] too.
//But this does not happen since `push_back` always make a copy
I guess I need to create vector<Frame*>
rather than vector<Frame>
. (But like I said, I tried vector<Frame*>
too and got segfault. The memory stuff is very complicated to me now). So, I give up and pleased with current design of pytraj
, (which means slicing Trajectory
will give a new copied Trajectory
)
Hai
from pytraj.
ah,
I make vector<Frame*>
working for pytraj
now (anyway, still feel that C++ is too complicated :D). Great. So now I can slice Trajectory
and return a view
of it.
new update:
wow, if using Trajectory
as a wrapper of vector<Frame>
, I can not subclass it in Python level due to getting double-free memory. However using vector<Frame*>
does the trick and I can freely subclass Trajectory
at Python level. This is great deal since I can freely update python code without recompiling Cython code.
Hai
from pytraj.
Finally I can get a view
when slicing Trajectory
. Thank you guys for stimulating discussion.
In [2]: from pytraj import io
In [3]: from pytraj.testing import aa_eq
In [4]: # load sample to `Trajectory` (tz2.ortho.*)
In [5]: traj = io.load_sample_data("tz2")[:]
In [6]: mylist = [1, 5, 8]
In [7]: # slicing to get 3 frames. `aa_eq` = `assert_almost_equal to 7 decimal place
In [8]: fa0 = traj[mylist]
In [9]: fa1 = traj[mylist]
In [10]: aa_eq(fa0.xyz, fa1.xyz)
In [11]: aa_eq(fa0.xyz, traj[mylist].xyz)
In [12]: # update fa0 and make sure fa1, traj are updated too
In [13]: fa0[0, 0] = [100., 101., 102.]
In [14]: assert fa1[0, 0, 0] == fa0[0, 0, 0] == 100.
In [15]: assert traj[1, 0, 0] == 100.
In [16]: # result
In [17]: print (fa0[0 , 0], fa1[0 , 0], traj[1 , 0])
[ 100. 101. 102.] [ 100. 101. 102.] [ 100. 101. 102.]
from pytraj.
just tested slicing speed and pytraj
is 4 times faster than numpy
version in mdtraj
(and 8 times faster than my older version of FrameArray
with vector<Frame>
In [8]: fa
Out[8]:
<Trajectory with 1000 frames, 17443 atoms/frame>
In [9]: m_traj
Out[9]: <mdtraj.Trajectory with 1000 frames, 17443 atoms, 5666 residues, and unitcells
at 0x2aaab94eb320>
In [10]: s = slice(0, 1000, 5)
In [11]: %timeit fa[s]
10 loops, best of 3: 44.1 ms per loop
In [12]: %timeit m_traj[s]
10 loops, best of 3: 179 ms per loop
from pytraj.
Cython is so cool. I just implemented very fast slicing by add more types for pytraj
in Cython code.
Unbelievably the new slicing method is >= 1600 times my fastest implementation in above test (and therefore about >= 6000 times faster than mdtraj
) :D
In [5]: fa = io.load("remd.000.nc", top="myparm.parm7")
In [6]: fa
Out[6]:
<pytraj.Trajectory with 1000 frames: <Topology with 5634 mols, 5666 residues, 17443 atoms, 17452 bonds, PBC with box type = truncoct>>
In [7]: s = slice(0, 1000, 5)
In [8]: %timeit fa._fast_slice (s)
The slowest run took 4.24 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 27.4 µs per loop
@swails I highly recommend to use Cython if you ever want to speedup your ParmEd. It's good for
others to look at the code if they don't know to write ext in C/C++ like you.
Hai
from pytraj.
I am closing this since names were changed.
from pytraj.
Related Issues (20)
- Pytraj nastruct needs to be updated to handle [nxyz] data HOT 9
- wrong major groove calculation HOT 8
- calculated basepair matches in pytraj HOT 2
- Installation error of pytraj with pip HOT 4
- energy_decompostion throws "Fortran runtime error: Bad value during integer read" HOT 1
- Has anyone tried a Python 3.11 conda package build yet? HOT 11
- Spurious Test Failures with Cpptraj HOT 9
- ModuleNotFoundError: No module named 'pytraj.core.topology_objects' HOT 15
- Help: error building pytraj native library when make install 91% HOT 10
- Function to find the atoms of a given mask that appear in a region of xyz coordinate space? HOT 10
- hbond pmap_mpi output format HOT 7
- Make binary wheel for pytraj? HOT 5
- pytraj.plot module not found HOT 2
- Reporting Issues with Pytraj using tutorials HOT 28
- pytraj segmentation fault with cython 3.0 and cpptraj commit f1d21f688de5ea5c55e650abe6efdbc352bdc613 HOT 1
- pmap parallel error HOT 6
- How do I import pytraj? HOT 4
- nh_order_parameters
- pytraj not compatible with OSX-arm64 arch
- Trying to pip install pytraj HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytraj.