zincware / znh5md Goto Github PK
View Code? Open in Web Editor NEWZnH5MD - High Performance Interface for H5MD Trajectories
License: Apache License 2.0
ZnH5MD - High Performance Interface for H5MD Trajectories
License: Apache License 2.0
When writing have a retry option. E.g. if read via zndraw this could be useful.
Have znh5md[dask]
as an extra install.
see data from zincware/IPSuite#242
Go through atoms.arrays
to find additional properties?
Line 12 in 274f7f6
Read from a list of ASE Atoms objects and only a few implement a calculator and the others don't.
This currently raises some strange index errors.
Seems to be untested at the moment
Line 80 in 7dfc684
It would be a great feature, especially for the ´znh5md.ASEH5MD´ to have:
__iter__
is called, load the next n batches and not just one at a time for better performance when looping)atoms[:20]
the first 20 atoms objects are kept in memory and not loaded againref = ase.io.read("nvt_eq.gro")
reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
db = znh5md.io.DataWriter("nvt_eq.h5")
db.initialize_database_groups()
db.add(reader)
x = znh5md.ASEH5MD("nvt_eq.h5")
for atoms in x.get_atoms_list():
atoms.set_atomic_numbers(ref.get_atomic_numbers())
ase.io.write("nvt_eq.xyz", atoms_list)
If a group does not exist in the file but someone tries to access it, raise an GroupNotFound
error
Because it is not part of H5MD this should be a kwarg and log a warning that the file might not be readable!
atoms.to_json()
and Atoms.from_json()
would be nice additions.
Currently ZnH5MD assumes a constant number of particles per trajectory. E.g. in the IPSuite use case that might not be the case so we should support changes in the number of particles.
Line 206 in 2b1a654
KeyError: "Can't open attribute (can't locate attribute: 'dimension')"
using
u = mda.Universe.empty(n_atoms=300)
u.load_new('nodes/ASEMD/trajectory.h5', format='H5MD', in_memory=True)
Line 69 in 6565dda
ASE seems to raise a RuntimeError and not property not implemented error if you try to gather e.g. energies from atoms without a calculator
A group of either time dependent or time independent data:
/connectivity/
see https://h5md.nongnu.org/h5md.html#connectivity-group
class Atoms(ase.Atoms):
bonds
def get_graph() -> networkx
also see https://github.com/imagdau/aseMolec
To concat n files together into a single one
Currently one has to do:
ref = ase.io.read("nvt_eq.gro")
reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
db = znh5md.io.DataWriter("nvt_eq.h5")
db.initialize_database_groups()
db.add(reader)
x = znh5md.ASEH5MD("nvt_eq.h5")
for atoms in x.get_atoms_list():
atoms.set_atomic_numbers(ref.get_atomic_numbers())
ase.io.write("nvt_eq.xyz", atoms_list)
but would like to do:
reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
for atoms in reader.get_atoms_list(): ...
This operation as mapping over lists can use
num_cores = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=num_cores) as pool:
structures = pool.map(process_single_structure, data_array)
for better performance.
There are a few different possiblities to solve this:
species
always exists and add species=["Na", "Cl"]
. Iterate over the species dataset and create a mask for every step / batchtraj.species
to the get_dataset
(or some variation) together with a filter for full flexibility if you want to filter by species, ids, charge, ...ZnH5MD/znh5md/znh5md/__init__.py
Line 157 in 6625f19
More general: how to handle time independent data?
Calling slice_by_species
is very slow
ZnH5MD doesn't seem to work on Windows Operating systems
TensorFlow does a lot more than what we require here. We could also return numpy arrays and use something like the following https://stackoverflow.com/questions/7323664/python-generator-pre-fetch for prefetching.
Currently Energy is stored in the particles group but it would fit better into /observables/Atoms/...
It could be possible to write to the database via traj.positions = tf.data.Dataset to write to H5MD
. A property setter is probably not sufficient, because one might want to add some args. so maybe znh5md.update_database(property=traj.position, dataset=tf.data.Dataset, append=True)
The todict / fromdict methods don't support calculators. Use SinglePointCalculators to also support these information. Support full results..
Also add a to_json()
method and from_json()
method.
Concatenate the observables, export to calc and arrays into a single file.
Support prefix per file
Consider adding progress bars
This library is designed around ase
. I don't think the dask
features are used by anyone.
Thus I'd suggest redesigning it fully around ase.
This would include
pyh5md
ase.io.iread
compatibility (eventually make this a ASE feature)ase.io.write
Some context managers for better performance
see https://gitlab.com/ase/ase/-/merge_requests/2387
ZnH5MD does use file = open(filename)
often without closing it.
We should check if we can replace this with some context managers.
Maybe use https://wiki.fysik.dtu.dk/ase/ase/gui/gui.html to view the trajectories (write to temporary path or something)
H5MD does not allow for PBC to be dynamic. For IPS we can either
Allow for:
dataset.position.slice_by_species(species="B")
dataset.position.slice_by_species(species=["B", "F"])
and for the future, also support molecules / COM
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.