mdanalysis / pytng Goto Github PK
View Code? Open in Web Editor NEWPython bindings for TNG file format
Home Page: http://mdanalysis.org/pytng
License: BSD 3-Clause "New" or "Revised" License
Python bindings for TNG file format
Home Page: http://mdanalysis.org/pytng
License: BSD 3-Clause "New" or "Revised" License
A release of pytng is necessary in order to allow for py3.12 support - this is blocking the core library here: MDAnalysis/mdanalysis#4327
In theory TNG files can have variable numbers of atoms between frames eg from a grand canonical simulation. This is indicated by a flag in the TNG file
typedef enum
{
TNG_CONSTANT_N_ATOMS,
TNG_VARIABLE_N_ATOMS
} tng_variable_n_atoms_flag;
While this is not supported upstream by MDA, we should look into whether introducing this is worth it for completeness sake.
The pytng tarball is currently larger than the MDAnalysis tarball size (by roughly 3x).
The main source of this is the inclusion of test files as part of the tarball. I'm opening this issue to see if this is required / can be addressed in some way.
Build with the above versions of numpy, cython and python don't work. seems related to a cython issue #2938
We might need to pin to versions below this.
Currently pickling of the Reader instance is not supported. It would be handy not to have to handle this upstream in MDAnalysis and instead push it down to this level.
Use https://github.com/python-versioneer/python-versioneer to handle version information.
__version__
conf.py
use the generated version informationShould we drop or keep python 2.7 support? Tests are currently failing on 2.7 so I thought an opportune time to discuss.
PyTNG.tell
seems to count from 1 while everything else counts from 0:
with pytng.TNGFile('/home/jon/dev/pytng/tests/reference_files/tng_example.tng') as tng:
print(tng[9].step) # 9
print(tng.tell()) # 10
I guess this is because tell
returns the step that is about to be read, which is the next compared to the current one. This is misleading because it is not what the docstring says, and also because I cannot use directly the result of tell
to later seek
at the same frame. Also, it is different from the behaviour that is expected based on what files usually do:
with open('any-file') as a:
print(a.tell()) # 0
a.seek(10)
print(a.tell()) # 10
I've been playing with v0.2.2-alpha and am puzzled by the behavior for empty steps.
Setup:
In [64]: tng = pytng.TNGFileIterator("reference_files/argon_npt_compressed.tng", 'r')
In [65]: positions = tng.make_ndarray_for_block_from_name("TNG_TRAJ_POSITIONS")
As expected: accessing a step with no data (the stride is 5000) gives NAN:
In [66]: tng[0].get_positions(positions)
Out[66]:
array([[2.5329998 , 1.244 , 3.5059998 ],
[0.8299999 , 2.544 , 3.4479997 ],
[1.091 , 0.10999999, 3.1289997 ],
...,
[3.2949998 , 2.8899999 , 1.9889998 ],
[0.22999999, 0.92599994, 1.0619999 ],
[3.0709999 , 2.495 , 3.5249999 ]], dtype=float32)
In [67]: tng[1].get_positions(positions)
Out[67]:
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], dtype=float32)
In [68]: tng[2].get_positions(positions)
Out[68]:
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], dtype=float32)
In [69]: tng[0].get_positions(positions)
Out[69]:
array([[2.5329998 , 1.244 , 3.5059998 ],
[0.8299999 , 2.544 , 3.4479997 ],
[1.091 , 0.10999999, 3.1289997 ],
...,
[3.2949998 , 2.8899999 , 1.9889998 ],
[0.22999999, 0.92599994, 1.0619999 ],
[3.0709999 , 2.495 , 3.5249999 ]], dtype=float32)
In [70]: tng[1].get_positions(positions)
Out[70]:
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan],
...,
[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]], dtype=float32)
Now looking at steps at the end of the trajectory that should not contain data (-2) vs one that does (the last frame) โ just to show the frame numbers:
In [81]: range(0, tng.n_steps, tng.block_strides["TNG_TRAJ_POSITIONS"])
Out[81]: range(0, 500001, 5000)
In [82]: list(range(0, tng.n_steps, tng.block_strides["TNG_TRAJ_POSITIONS"]))[-1]
Out[82]: 500000
In [85]: list(range(0, tng.n_steps))[-1]
Out[85]: 500000
i.e. the last frame with data is -1 from the unsliced trajectory.
But when I access steps from the end of the trajectory, the data for -2 just repeats the data from the last frame with data (-1) instead of giving a NAN array:
In [71]: tng[-1].get_positions(positions)
Out[71]:
array([[0.43999997, 0.38899997, 1.374 ],
[1.4319999 , 1.6489999 , 2.939 ],
[2.0149999 , 2.103 , 2.6569998 ],
...,
[2.0849998 , 3.5509996 , 1.4359999 ],
[0.15599999, 3.5019999 , 0.31399998],
[1.2889999 , 0.9979999 , 1.6449999 ]], dtype=float32)
In [72]: tng[-2].get_positions(positions)
Out[72]:
array([[0.43999997, 0.38899997, 1.374 ],
[1.4319999 , 1.6489999 , 2.939 ],
[2.0149999 , 2.103 , 2.6569998 ],
...,
[2.0849998 , 3.5509996 , 1.4359999 ],
[0.15599999, 3.5019999 , 0.31399998],
[1.2889999 , 0.9979999 , 1.6449999 ]], dtype=float32)
build_sphinx
is not working while the sphinx makefile is.
Note that adding a build sphinx config in setup.cfg broke doc deployment(See #60).
the FileIterator.n_data_frames
attr doesn't appear to work for a trajectory that has gone through TRJCONV
Following some discussions with @richardjgowers, we have decided that the data should be read with something that iterates blocks rather than TNGFrame steps. This means that progress will halt on #29 for now. :)
I guess it is because writing is not implemented yet, but when opening a file in w
mode the file has to exist.
>>> import pytng
>>> pytng.TNGFile('new.tng', mode='w')
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-992be612e3da> in <module>()
----> 1 pytng.TNGFile('new.tng', mode='w')
pytng/pytng.pyx in pytng.pytng.TNGFile.__cinit__ (pytng/pytng.c:2339)()
pytng/pytng.pyx in pytng.pytng.TNGFile.open (pytng/pytng.c:2679)()
OSError: file does not exists: new.tng
This is due to line 105 of pytng.pyx that tests if the file exists regardless of the mode. Note that the test is repeated at line 121, but this time testing for the mode. (By the way, the first test uses os.path.isfile
that look more robust that os.path.exists
from the second one.)
Maybe a more "accurate" way of dealing with writing not being supported yet would be to raise a NotImplementedError
just after line112 .
We should have a full list of blocks and there platform independent definitions (block_ids) listed in the PyTNG docs somewhere.
with pytng.TNGFile('/home/jon/dev/pytng/tests/reference_files/tng_example.tng') as tng:
for frame in tng:
pass
# We iterated through the whole file. `tng.reached_oef` is `True`.
tng.seek(0)
print(tng.tell())
# We are back at the beginning of the file, far from the end of file.
print(tng[4])
I expect to get a frame, instead this raises:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-23-d4513426d8e3> in <module>()
4 tng.seek(0)
5 print(tng.tell())
----> 6 print(tng[4])
pytng/pytng.pyx in pytng.pytng.TNGFile.__getitem__ (pytng/pytng.c:6855)()
pytng/pytng.pyx in pytng.pytng.TNGFile.read (pytng/pytng.c:4523)()
OSError: Reached last frame in TNG, seek to 0
When the API is stable we should package for conda and pip!
Windows cannot be built due to problems with Zlib pathing.
Currently we assume that all the TNG blocks in a TNG file are present at the first timestep. This should be valid for the vast majority of TNG files, but may not be for all. Thus we should find a way to iterate over the whole file and check if there are any off stride blocks.
Update the docs theme with the new UserGuide-like theme and add the Goat Counter page counter
(Also check that deployment works as it should, possibly update .travis.yml #5 )
Currently there is a catch all read_success
method that indicates data presence and reading success. This should be split into two seperate paths along with better exceptions.
Probably would also drop some older versions of python at the same time tbh.
Need to add a test for correct documentation build to travis CI.
I broke the doc deployment with #57. Will fix.
we should check that we allow utf8 file characters for filenames
We are missing a long description on PyPi.
Things todo for the release
Some possible issues withtng_util_trajectory_next_frame_present_data_blocks_find were identified in Chemfiles
:
See chemfiles/chemfiles#430 and https://gitlab.com/gromacs/tng/-/issues/18.
AFAIK, there is no problem in PyTNG ATM as the step
variable is returned incorrect and we don't use it, but would be good patch when fixed upstream
When calling seek
on a file that is not open, the exception says "seek not allowed in write mode" instead of a relevant message about the file not being open.
with pytng.TNGFile('/home/jon/dev/pytng/tests/reference_files/adk_oplsaa.tng') as tng:
pass
tng.seek(0)
raises
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-3-bc96c1499ec9> in <module>()
1 with pytng.TNGFile('/home/jon/dev/pytng/tests/reference_files/adk_oplsaa.tng') as tng:
2 pass
----> 3 tng.seek(0)
pytng/pytng.pyx in pytng.pytng.TNGFile.seek (pytng/pytng.c:5997)()
OSError: seek not allowed in write mode
@kain88-de I've added a file in tests/
called crash.py, it (tries to) opens a nonexistant file, but this causes a bad crash
Follow up from: MDAnalysis/mdanalysis#3036
As our Travis allocation is organisation wide, it might be best to switch over to Github actions where possible.
I forgot to squash commits, then reverted my PR, thereby making an absolute mess.
The old PyTNG API should be removed once we are happy with the newer API.
https://www.mdanalysis.org/pytng/sitemap.xml contains broken links because a version string was inserted.
The configuration for the sitemap sphinx plugin must be updated, see MDAnalysis/MDAnalysis.github.io#202
The solution appears is to set in conf.py
sitemap_url_scheme = "{link}"
so that the version is not included.
One of the strengths of the TNG format is that it contains an embedded mtop.
We should use this information.
The docs at https://www.mdanalysis.org/pytng show version 0.1. We should dynamically get pytng.__version__
.
Design questions
How will we figure out what data is accessible at each frame? Not all trajectories have positions and box at every frame interval, let alone other data like forces, velocities which can be written at whatever interval you specify. (A whole list of available blocks is in the tng_io.h
).
Thus we need to find a way to not try and pull all the info at every step.
FYI
The TNG syntax for "frames" is also a bit confusing as it really is the number of actual md steps (nsteps
in gromacs).
Data is written with a stride_length
ie data is deposited every stride_length "frames". I just thought I would bring this up as it tripped my up repeatedly.
See e.g. https://travis-ci.org/github/MDAnalysis/pytng/jobs/719409548
==================================== ERRORS ====================================
________________________ ERROR collecting test session _________________________
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pluggy/hooks.py:289: in __call__
return self._hookexec(self, self.get_hookimpls(), kwargs)
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pluggy/manager.py:87: in _hookexec
return self._inner_hookexec(hook, methods, kwargs)
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pluggy/manager.py:81: in <lambda>
firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pytest_pep8.py:38: in pytest_collect_file
return Pep8Item(path, parent, pep8ignore, config._max_line_length)
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/_pytest/nodes.py:95: in __call__
warnings.warn(NODE_USE_FROM_PARENT.format(name=self.__name__), stacklevel=2)
E pytest.PytestDeprecationWarning: Direct construction of Pep8Item has been deprecated, please use Pep8Item.from_parent.
E See https://docs.pytest.org/en/stable/deprecations.html#node-construction-changed-to-node-from-parent for more details.
Use of a bit array that indicates data present could greatly improve data access patterns.
In theory the TNG format can do grand-canonical simulation. We currently do not support this or check for it. This should be addressed in tests as well.
Some questions about support for this upstream? unsure if how this works in GROMACS or otherwise.
For some insane reason I uploaded the last release without making tags on master. Ill bump to 2.3.1 and push the new packages once #99 is fixed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.