Code Monkey home page Code Monkey logo

cpctools's Introduction

cpctools

PyPI - License PyPI PyPI - Python Version Code style: black Hatch project Documentation Status Coverage Status Powered by MDAnalysis

We are no longer developing in this project. Future works will be in dynsight.

cpctools is a python 3.8/3.9/3.10 library aimed at simplifying the analysis of Molecular Dynamics simulations.

cpctools stands for Computational Physical Chemistry TOOLS, or, if you prefer, for Chemical Physics Computational TOOLS.

It contains two packages, SOAPify and SOAPify.HDF5er.

cpctools uses h5py to store the trajectories, the SOAP fingerprints, and the analysis results in a binary format, facilitated by HDF5er.

The documentation is available on read the docs. There you can consult the documentation for each available version of the package.

How To Install

To install the stable version of cpctools just type:

pip install cpctools

If you want to use dscribe or quippy for calculating the SOAP features you should install them separately, since they are quite heavy packages on their own, and usually you would use only one of these packages:

pip install "dscribe<=1.2.2,>1.2.0"
pip install "quippy-ase==0.9.10"

package: SOAPify

This package contains:

  • a toolbox to calculate the SOAP fingerprints of a system of atoms. The principal aim is to simplify the setup of the calculation. This toolbox depends on dscribe or quippy and 'unify' the output of the two codes.
  • a toolbox to calculate the distances between SOAP fingerprints
  • a simple analysis tool for trajectories of classified atoms

package: SOAPify.HDF5er

This package is a toolbox to create hdf5 files with h5py from the trajectory and topology files. The format we use do not align with h5md

Our format is thought to speed up the calculations without occupying too much RAM, thanks to the hdf5 dataset chunking capabilities.

  • The data within the files are organized into Group categories:
    • "Trajectories" contains subgroups that represent the various stored trajectories, each trajectory subgroup contains three datasets:
      • "Types" contains the types of atoms in the simulation
      • "Box" contains the history of the box dimensions
      • "Trajectory" contains the history of the particle positions
    • "SOAP" group contains the Dastasets of the calculated SOAP fingerprints, each SOAP Dataset contains attributes with the settings to reproduce the results.
    • "Classification" contain a group per trajectory, the format of the Dataset contained within is not fixed
  • The user can choose to use a single file per project or to store separately the results of the various steps of the analysis project (this is more recommended). SOAPify.HDF5er contains a tool for exporting the trajectories from the hdf5 file to extended xyz format, compatible with ovito

cpctools's People

Contributors

iximiel avatar andrewtarzia avatar tdewaard avatar

Stargazers

Markus Rauhalahti avatar Andrew S. Rosen avatar Debarshi Banerjee avatar François Bérenger avatar

Watchers

 avatar

cpctools's Issues

Refactor tests

I must refactor the tests to make them more rational an by removing a lot of code-repetition

BUG in SOAP classify

Bug in line 112 in SOAPClassify.py!!!

Should address this and a test should catch this error!!!!

if x[0].nmax != i.nmax or x[0].nmax != i.lmax:

Add mda-to-extxyz

Like title, need to add a way to convert mda/ase trajectories to extended xyz

Add origin to HDF5 files

Extended xyz and other formats supports the origin of the box
We should save this information:

Step to do:

  • Import Origin From ase and MDA
  • Export Origin to xyz and ase/MDA

Transition matrices should have the error column optional

As title:
The actual behavior for transitionMatrixFromSOAPClassification adds an additional column to the states, that is useful only if the user gives -1 as states to represent errors.
A workaround is ti print the transition matrix with [:-1,:-1]. But I think I need to make this clearer to the user

Allow for atom selection in functions for easier usability

Hello
I posted an issue on the paper repos GitHub (GMPavanLab/LENS#1) and was pointed towards this GitHub.
I can see that some of the improvements have already been implemented here, but I think that introducing atom selections in the function call might improve the versatility of the code.

My suggestion is to make the following change to the function "listNeighboursAlongTrajectory":
From:

def listNeighboursAlongTrajectory(
    inputUniverse: Universe, cutOff: float, trajSlice: slice = slice(None)
) -> "list[list[AtomGroup]]":
    nnListPerFrame = []
    for ts in inputUniverse.universe.trajectory[trajSlice]:
        nnListPerAtom = []
        nnSearch = AtomNeighborSearch(inputUniverse.atoms, box=inputUniverse.dimensions)
        for atom in inputUniverse.atoms:
            nnListPerAtom.append(nnSearch.search(atom, cutOff))
        nnListPerFrame.append([at.ix for at in nnListPerAtom])
    return nnListPerFrame

To:

def listNeighboursAlongTrajectory(
    inputUniverse: Universe, cutOff: float, trajSlice: slice = slice(None), atom_sel: str = "all"
) -> "list[list[AtomGroup]]":
    nnListPerFrame = []
    for ts in inputUniverse.universe.trajectory[trajSlice]:
        nnListPerAtom = []
        nnSearch = AtomNeighborSearch(inputUniverse.select_atoms(atom_sel), box=inputUniverse.dimensions)
        for atom in inputUniverse.select_atoms(atom_sel):
            nnListPerAtom.append(nnSearch.search(atom, cutOff))
        nnListPerFrame.append([at.ix for at in nnListPerAtom])
    return nnListPerFrame

This would allow easy selections of atoms in the universe while defaulting to all atoms.

Can't open attribute (can't locate attribute: 'l_max') when running timeSOAP.ipynb

Describe the bug
A clear and concise description of what the bug is.
KeyError Traceback (most recent call last)
/nfs/scistore14//timesoap/SOAPify/Examples/timeSOAP.ipynb Cell 7 line 2
24 slide = 1
26 return nAt, timedSOAP, np.diff(timedSOAP.T, axis=-1)
---> 29 nAtoms, tSOAP, dtSOAP = getTimeSOAP(soapFileName, trajAddress)

/nfs/scistore14//timesoap/SOAPify/Examples/timeSOAP.ipynb Cell 7 line 6
4 print(f)
5 ds = f[f"/SOAP/{trajAddress}"]
----> 6 fillSettings = getSOAPSettings(ds)
7 print(fillSettings)
8 print(ds.shape)

File ~/software/soapify/lib/python3.10/site-packages/SOAPify/utils.py:345, in getSOAPSettings(fitsetData)
328 """Gets the settings of the SOAP calculation
329
330 you can feed directly this output to :func:fillSOAPVectorFromdscribe
(...)
342
343 """
344 print(fitsetData)
--> 345 lmax = fitsetData.attrs["l_max"]
346 nmax = fitsetData.attrs["n_max"]
347 symbols, atomicSlices = getSlicesFromAttrs(fitsetData.attrs)
...
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5a.pyx:80, in h5py.h5a.open()

KeyError: "Can't open attribute (can't locate attribute: 'l_max')"

Additional context
I use "wget https://github.com/GMPavanLab/dynNP/releases/download/V1.0-trajectories/ico309.hdf5" and put the data file in the same directory with "trajAddress = './' .

Changed the dscribe behaviour

dscribe1.2.1 returns an array of shappe (nat,nsoap) instead of shape (1,nat,nsoap) if we are analysing only 1 frame!
This breaks the saponifyWorker function

Box not updating

Describe the bug
The box size does not update within each trajChunkSize when creating the .hdf5 calling MDA2HDF5 function.
A possible workaround is to set trajChunkSize=1, thus the box update every step, but extracting the SOAP is slow.

Creating an easier interface for hdf5 files

This should be the 0.1 feature:

  • A way for the user to get and interact easier with trajectories out of the hdf5 files
  • An instrument that reads the groups and makes some assumptions about their content, to show the user faster the content of the files
  • I should not over-engineer this
  • Maybe create a reader compatible with MDA

Mismatching in labels of clusters and transition matrix

Describe the bug

In SOAPify/Examples/LENS.ipynb, tmat labels are given incorrectly when the clusters assigned by KMeans are not in order (e.g.: [C0=0, C2=2, C1=1]).
The output of calculateTransitionMatrix is a matrix with columns and rows corresponding to ordered clusters (e.g. for columns: C0=0 in col 0, C1=1 in col 1, C2=2 in col 2 ...) while the label assignment is given depending on the cluster order (C0 for col 0, C2 for col 1, C1 for col 2).
The problem is fixed by sorting the labels, from:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in minmax]
)

to:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in np.sort(minmax, axis=0)]
)

To reproduce the bug, changing the random_state parameter in KMeans (and thus the cluster assignment order) changes the exchanging probabilities.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.