retry if file locked

an iterator that prefetches data (if __iter__ is called, load the next n batches and not just one at a time for better performance when looping)
a lazy sequence, if accessed atoms[:20] the first 20 atoms objects are kept in memory and not loaded again
an interface based on numpy and not dask because there is no benefit in using dask here.

`znh5md convert` support `*.gro` and `.xtc`

ref = ase.io.read("nvt_eq.gro")

reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
db = znh5md.io.DataWriter("nvt_eq.h5")
db.initialize_database_groups()
db.add(reader)

x = znh5md.ASEH5MD("nvt_eq.h5")
for atoms in x.get_atoms_list():
    atoms.set_atomic_numbers(ref.get_atomic_numbers())
ase.io.write("nvt_eq.xyz", atoms_list)

add `GroupNotFound` error

If a group does not exist in the file but someone tries to access it, raise an GroupNotFound error

Make variable PBC optional

Because it is not part of H5MD this should be a kwarg and log a warning that the file might not be readable!

add `atoms.to_json()` method for easy serialization

atoms.to_json() and Atoms.from_json() would be nice additions.

Support for changing number of particles

Currently ZnH5MD assumes a constant number of particles per trajectory. E.g. in the IPSuite use case that might not be the case so we should support changes in the number of particles.

support pathlib

ZnH5MD/znh5md/io/reader.py

Line 206 in 2b1a654

filename: str

MDAnalysis can't read files

KeyError: "Can't open attribute (can't locate attribute: 'dimension')"

using

u = mda.Universe.empty(n_atoms=300)
u.load_new('nodes/ASEMD/trajectory.h5', format='H5MD', in_memory=True)

Test without a calculator

ZnH5MD/znh5md/io/reader.py

Line 69 in 6565dda

except PropertyNotImplementedError as err:

ASE seems to raise a RuntimeError and not property not implemented error if you try to gather e.g. energies from atoms without a calculator

add `/connectivity` group

A group of either time dependent or time independent data:

/connectivity/

bonds
bond_order ?
particle_group

see https://h5md.nongnu.org/h5md.html#connectivity-group

class Atoms(ase.Atoms):
    bonds

    def get_graph() -> networkx

also see https://github.com/imagdau/aseMolec

`znh5md concat`

To concat n files together into a single one

Support reader to atoms_list

Currently one has to do:

ref = ase.io.read("nvt_eq.gro")

reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
db = znh5md.io.DataWriter("nvt_eq.h5")
db.initialize_database_groups()
db.add(reader)

x = znh5md.ASEH5MD("nvt_eq.h5")
for atoms in x.get_atoms_list():
    atoms.set_atomic_numbers(ref.get_atomic_numbers())
ase.io.write("nvt_eq.xyz", atoms_list)

but would like to do:

reader = znh5md.io.ChemfilesReader("nvt_eq.xtc")
for atoms in reader.get_atoms_list(): ...

use `multiprocessing` for `remove_nan`

This operation as mapping over lists can use

num_cores = multiprocessing.cpu_count()
with multiprocessing.Pool(processes=num_cores) as pool:
    structures = pool.map(process_single_structure, data_array)

for better performance.

add mask by species / name wrapping traj.position.get_dataset(species=["Na", "Cl"]) or `species=[1, 2]`

There are a few different possiblities to solve this:

Assume indices are ordered, create a mask once and apply it to every batch
Assume that the key species always exists and add species=["Na", "Cl"]. Iterate over the species dataset and create a mask for every step / batch
Pass a dataset, e.g. traj.species to the get_dataset (or some variation) together with a filter for full flexibility if you want to filter by species, ids, charge, ...

[time independent data]: save PBC from Atoms

ZnH5MD/znh5md/znh5md/__init__.py

Line 157 in 6625f19

pbc=True, # TODO: pbc should not always be true

More general: how to handle time independent data?

Test on Windows

ZnH5MD doesn't seem to work on Windows Operating systems

fix encoding of boundary
check paths

Add numpy option in addition / instead of TensorFlow

TensorFlow does a lot more than what we require here. We could also return numpy arrays and use something like the following https://stackoverflow.com/questions/7323664/python-generator-pre-fetch for prefetching.

Store energy in observables

Currently Energy is stored in the particles group but it would fit better into /observables/Atoms/...

Improvement List

Change List

~~species handling "particles//velocity". See MDSuite database or Espresso dump~~ no files available
#3
add batch_size as argument to get_dataset.
transpose to be always (n_confs, n_atoms, n_dim) via argument
loop_indices should support slices like [::2] every second
default prefetch to batch size

Write Files

It could be possible to write to the database via traj.positions = tf.data.Dataset to write to H5MD. A property setter is probably not sufficient, because one might want to add some args. so maybe znh5md.update_database(property=traj.position, dataset=tf.data.Dataset, append=True)

maybe use pyh5md
include ase.io.iread compatibility (eventually make this a ASE feature)
ase.io.write

Some context managers for better performance

Can not read lammps through ASE with charges

see https://gitlab.com/ase/ase/-/merge_requests/2387

use chemfiles instead

allow dynamic PBC
make each configuration a new particle group

slice_by_species: support symbol and number

Allow for:

dataset.position.slice_by_species(species="B")
dataset.position.slice_by_species(species=["B", "F"])

and for the future, also support molecules / COM

zincware / znh5md Goto Github PK

znh5md's People

Contributors

Stargazers

Watchers

znh5md's Issues

Change List

Write Files

Recommend Projects

Recommend Topics

Recommend Org