uibcdf / molsysmt Goto Github PK

Open source library to work with molecular systems

Home Page: https://www.uibcdf.org/MolSysMT/

License: Other

Python 84.95% Shell 0.01% Fortran 0.78% Jupyter Notebook 4.44% TeX 9.83%

molecular-dynamics molecular-dynamics-simulation molecular-modeling molecular-simulation python molecular-dynamics-trajectories molsysmt

molsysmt's Introduction

MolSysMT

Installation | Documentation | License | Credits | Team

Molecular Systems Multi-Tool

This library was thought as a humble frontend to make the life of a computational molecular biology lab, the UIBCDF, easier. MolSysMT is design to cover specific needs, or to speed up workflows, when you are working with tools such as:

MDTraj
MDAnalysis
PDBFix
OpenMM
Yank
HTMD
PyEmma
ParmEd
NGLview
pdbtools?

Although MolSysMT was not concived to do what other tools do better, this toolkit can be used alone to do few simple tasks.

All credit should be given to the developers and mantainers of these former packages and the libraries they depend on.

Molecular Systems:

Aqui deberia de ir todo el tema de la creacion del sistema molecular junto con parametros y topologia
Deberia tambien estar esto preparado para trabajar con moleculas como ligandos.

poner ParmEd y OpenBabel: parmed.github.io http://openbabel.org/wiki/Main_Page

pdbfixer

Molecular Dynamics:

Installation

Dependencies

-Fortran Compiler (gfortran or intel fortran compiler) -Lapack ('conda install lapack' would work?)

Other python packages as those mentioned here(link to section) and included in this list(file).

Conda

Updating

GitHub

git clone [email protected]:UIBCDF/MolSysMT.git
cd MolSysMT
python setup.py develop

pip uninstall molsysmt

Updating

To be written

Documentation

http://www.uibcdf.org/MolSysMT/

License

Credits

All credit should be given to the developers and mantainers of the following tools and dependencies:

...

Team

Responsables

Diego Prada Gracia
Liliana M. Moreno Vargas

Contributors

...

Citation

Last version DOI:

Cite the last version with the following DOI provided by Zenodo:

Cite all versions?

You can cite all versions by using the following DOI. This DOI represents all versions, and will always resolve to the latest one:

Acknowledgments and Copyright

Copyright

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

molsysmt's People

Contributors

Stargazers

Watchers

Forkers

dprada daniel-ibarrola lmmv

molsysmt's Issues

Some digestion methods need to be implemented

Some digestion methods were included as empty functions. They need to be implemented:

digest_box
digest_comparison
digest_coordinates

And some needs to completed or adjusted:

digest_engine
digest_selection

Secondary structure prediction for sequence segments

Can MolModMT include a method to predict the propensity of a given sequence segment to form secondary structures? Something similar to what APSSP server does, or any other similar tool.

Hbonds output formats

What would be the best way to encode the list of hydrogen bonds found in a trajectory?

A dictionary of lists (with frame indices where the bond was formed)
A sparse tensor (3d with the library "Sparse")
An array of arrays with different shape (number of frames x lists of detected hbonds)
A list of sparse matrices (as mdtraj)
A native Hbond class where the info is stored as a sparse tensor with methods to be able to access quickly to different formats.
...

This is not clear now.

Proteins, small molecules and entities names

MolSysMT has to give a name to proteins, small molecules, and entities. These names can be extracted from an mmtf file, but what happens with other forms? We probably need to implement in Sabueso the tools to find these names together with other attributes from files and databases.

Do we need a MolecularSystemNeeded error?

In @Daniel-Ibarrola's PR the following code in basic/set.py:

    if check:

        if not is_molecular_system(molecular_system):
            raise MolecularSystemNeeded()

was replaced by:

    if check:

        if not is_molecular_system(molecular_system):
            raise TypeError("A molecular system is needed.")

I gues that at this point, the version of current process should be something as:

    if check:
        digest_single_molecular_system(molecular_system)

And it is inside the digestion where the error should be raised. This needs to be reviewed. Do we already have a MolecularSystemNeeded error to be raised in this situation?

A method is needed to create biological assemblies.

In the protein data bank, a same structure can have more than a molecular system. Some structures have biological assemblies proposed (#32). In this case, the pdb or mmtf file have a unit molecular system and a list of geometrical transformations to obtain the assemblies. A possible way to handle this situation is the following:

The mmtf or pdb is converted always without applying any geometrical transformation.
A warning message is printed out if the PDB entry has more than a molecular system (there are geometrical transformations to create biological assemblies).
A method has to be included in MolSysMT with the capability to extract these geometrical transformations from mmtf files, pdb files or PDB ids, producing a new molecular system representing a particular biological assembly.

Having our own trajectory file parsers would be more efficient

MolSysMT should have its own file parsers. If not for every file format, at least for trajectory files -including pdbs-.

Are former auxiliary functions in private module `lists_and_tuples` necessary?

The following functions in a former version of _private/lists_and_tuples (see the file are probably called by code waiting to be re-implemented. This needs to be checked to confirm their removal.

def list_to_csv_string(obj):

    return ",".join([str(ii) for ii in obj])

def list_to_ssv_string(obj):

    return " ".join([str(ii) for ii in obj])

def are_equal_sets(objects1, objects2):

    if not is_list_or_tuple(objects1):
        objects1=[objects1]
    if not is_list_or_tuple(objects2):
        objects2=[objects2]

    output = False
    if set(objects1)==set(objects2):
        output = True

    return output

Removing todo.txt

The file todo.txt is no longer useful. Any task to be reported as pending should be shared using the GitHub issues board and or the GitHub Project. See #54 (comment)

A test battery is mandatory to make continuous integration

A test battery has to be finished and implemented to make continuous integration. Doesn't matter if testing takes a lot and it is not efficient in its first version. This has to be done soon.

The argument 'check' needs to be removed (now we have @digest).

@Daniel-Ibarrola implemented a new mechanism to check the input arguments of MolSysMT's functions and classes (see #71). Now it's time to remove the old argument "check" from almost everywhere.

Can the omnipresent argument "check" be avoided?

Having the argument "check" everywhere in the code is annoying. Necessary at this point of the development, but probably avoidable.

For example. There could be a short way to check the validity of a set of input arguments in a function, only if they were given by an invocation out of the library's code (a script, a jupyter session, etc). Can this be implemented in a simple way?

We should also consider the following situation. If checking input arguments in functions is computationally expensive, should we give the option to the user not to do it?

New class iterator in the module `structure` to work with trajectories.

In my opinion, the only important functionality left to implement before publishing this is something to work with trajectories. We could probably discuss here how this tool could be shaped.

I would like to avoid additional auxiliary classes as most of the libraries have (mdtraj or MDAnalysis). In my opinion the power of the tool should be in a unique place, a iterator.

¿What do we need? We need a tool to extract frames (step, time, coordinates, and boxes) in a smart and simple way when these structures are stored in a trajectory file. However, their use should not be restricted to trajectory files.

As user, I would like to do something like this:

import molsysmt as msm

iterator = msm.structure.iterator('traj.h5', start=100, interval=10, stop=200, selection='atom_name=="CA"')

for step, time, coordinates, box in iterator:
     print(step, time, coordinates, box)

iterator.close()

import molsysmt as msm

iterator = msm.structure.iterator(['traj.gro', 'traj.xtc'], indices=[10, 22, 30, 78, 145], selection=[10, 11, 12])

for step, time, coordinates, box in iterator:
     print(step, time, coordinates, box)

iterator.close()

import molsysmt as msm

iterator = msm.structure.iterator('traj.dcd', start=1, interval=10, stop=1000, chunk_size=20)

for _, chunk_times, chunk_coordinates, _ in iterator:
     print(chunk_times, chunk_coordinates)

iterator.close()

This iterator could have the following instatation arguments:

molecular_system: The standard molecular system input.
start: first structure index of the trajectory to start with. [default: 0]
interval: number of structure indices to be skipt in each iteration. [default: 1]
stop: the iteration finishes if the current structure index is larger than or equal to this integer. [default: None]
chunk_size: amount of structures in the output of each iteration [default: 1]
selection: atoms selection [default: 'all']
syntaxis: syntaxis used in selection [default: 'MolSysMT']

I am not sure if to do this iterator method, we need to define a specific structures_iterator in the module item for each form (trajectory file or form with structures in general).

@Daniel-Ibarrola: What do you think? Do you have an alternative proposal?

Implementing or removing digest_box in digest/box.py

As @Daniel-Ibarrola pointed out, the function digest_box from _digestion/box.py needs to be implemented.

Pytraj objects, syntaxis and engines in MolSysMT

We have to start implementing the work with Pytrajs objects, syntaxis, and engines.

Does a second molecular system makes sense as input argument of some methods?

Some methods have the option to include a second molecular system as an input argument (distance, for instance). But does this make everything more cumbersome? It is probably unnecessary. There is already a method to merge molecular systems in case we need to calculate anything using two molecular systems.

Extract mmtf_MMTFDecoder from a mmtf_MMTFDecoder

A subroutine to extract a mmtf_MMTFDecoder from a mmtf_MMTFDecoder with an atoms list is needed.

Error when importing MolModMT in Jupyter-lab

When I imported MolModMT in jupyter-lab I got the following error:

import os
import mdtraj as md
from pdbfixer import PDBFixer
import pdbfixer as pdb
from simtk.openmm.app import PDBFile
import molmodmt as mt

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in find_gaff_dat()
    272         try:
--> 273             AMBERHOME = os.path.split(full_path)[0]
    274             AMBERHOME = os.path.join(AMBERHOME, "../")

~/miniconda3/envs/AnalysisMD/lib/python3.6/posixpath.py in split(p)
    106     everything after the final slash.  Either part may be empty."""
--> 107     p = os.fspath(p)
    108     sep = _get_sep(p)

TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-db124cf99dc6> in <module>
      4 import pdbfixer as pdb
      5 from simtk.openmm.app import PDBFile
----> 6 import molmodmt as mt

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/__init__.py in <module>
----> 1 from .multitool import *
      2 from . import utils
      3 from . import moldyn as moldyn
      4 from . import molsys as molsys
      5 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/multitool.py in <module>
     10 
     11 ## Classes
---> 12 from .formats.classes import dict_is_form as _dict_classes_is_form, \
     13     list_forms as _list_classes_forms, \
     14     dict_converter as _dict_classes_converter, \

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/__init__.py in <module>
----> 1 from .base import *

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/base.py in <module>
     18 
     19 for api_form in list_api_forms:
---> 20     module_api_form=_import_module('.'+api_form,base_package)
     21     form_name=module_api_form.form_name
     22     list_forms.append(form_name)

~/miniconda3/envs/AnalysisMD/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/api_yank_Topography.py in <module>
      1 from os.path import basename as _basename
----> 2 from yank import Topography as _yank_Topography
      3 
      4 form_name=_basename(__file__).split('.')[0].replace('api_','').replace('_','.')
      5 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/__init__.py in <module>
     12 from . import utils
     13 from . import multistate
---> 14 from . import restraints
     15 from . import pipeline
     16 from . import experiment

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/restraints.py in <module>
     35 from simtk import openmm, unit
     36 
---> 37 from . import pipeline
     38 from .utils import methoddispatch, generate_development_feature
     39 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/pipeline.py in <module>
     29 import numpy as np
     30 import openmmtools as mmtools
---> 31 import openmoltools as moltools
     32 from pdbfixer import PDBFixer
     33 from simtk import openmm, unit

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/__init__.py in <module>
     24 """
     25 
---> 26 from openmoltools import amber_parser, system_checker, utils, packmol, openeye, amber, cirpy, gromacs, schrodinger

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/openeye.py in <module>
      3 import mdtraj as md
      4 from openmoltools.utils import import_, enter_temp_directory, create_ffxml_file
----> 5 from openmoltools.amber import run_antechamber
      6 import logging
      7 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in <module>
    281     return os.path.join(AMBERHOME, 'dat', 'leap', 'parm', 'gaff.dat')
    282 
--> 283 GAFF_DAT_FILENAME = find_gaff_dat()
    284 
    285 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in find_gaff_dat()
    274             AMBERHOME = os.path.join(AMBERHOME, "../")
    275         except:
--> 276             raise(ValueError("Cannot find AMBER GAFF"))
    277 
    278     if AMBERHOME is None:

ValueError: Cannot find AMBER GAFF

It looks like is something related to the AMBER HOME path. Should I install Amber to fix the problem. I am managing MolModMT via Conda.

Learn how projects work in GitHub

Include rdkit molecular forms

Including rdkit forms is necessary to use MolSysMT in OpenPharmacophore. The classes to be included are:

rdkit.Chem.rdchem.Mol

An auxiliary library to cite everything that was used along a workflow has to be included

If MolSysMT is going to be used as support to build workflows, an auxiliary library to cite everything that was used along that workflow has to be included. If this library does not exist already, we may have to implement it.

Documentation restructuration

The structure of the documentation needs to be cleaned and reorganized.
This needs to be urgent since other members of the lab need to start using MolSysMT.

There are faster libraries to make queries in a DataFrame

There are faster libraries than Pandas. The concept of having the topology and the bonds as DataFrames is useful since we can, at any time, try other libraries such as Vaex, PySpark, Dask, ..., which seems to be more efficient than Pandas. We have to implement the fastest one doing queries in large systems. This is one of the strongest points of MolSysMT, the possibility to be the fastest library loading molecular systems and doing atoms selections (in small and very large systems); together with the fact that the topology is in a standard type of object used by many libraries.

basic.contains() function should work on attributes and selections

The method 'contains' should be refactored to work on attributes and selections.

For example, the following tthree commands should be equal:

msm.contains(molecular_system, waters=True)

msm.contains(molecular_system, n_waters=True)

msm.contains(molecular_system, selection='molecule_type=="water"')

All conversions writing a file have to have an output

All conversions to a file form have to return the absolute path of the new file.

Including extended PDB and CCD ID codes

The Protein Data Bank will extend the 4 letter PDB IDs coding system to larger IDs' strings. We have to check the new formats to be included in MolSysMT.

From the Protein Data Bank : "Future Planning: Entries with extended PDB and CCD ID codes will be distributed in PDBx/mmCIF format only.

Other Python libraries to be checked

List here other python libraries to be checked.

Faster graph tools

NetworkX is used as graphs or networks library to get the components, the bonds of a selection, etc. (check also msm.bondgraph). But NetworkX is currently one of the slowest graph libraries in the market. With the grant recently received from the Chan and Zuckerberg Initiative this can change in the future, but we should think in working with other libraries such us igraph, graph-tool, snap or networking. We should probably have some unique methods with an engine switcher.

Propuesta elementos

Tipos de grupos nuevos:

cosolute
small molecule

A molecular system can be a set of items, not only one.

A molecular system can be defined by a set of items, not only one.
This means that all methods should be able to work with a list of items, not just one.
This way, the possibility to have an item for the topology, a second item for the coordinates, and a third item for the box, i.e., requires auxiliary methods to deal with it.
In addition, this has to be documented as something implicit in the philosophy of MolSysMT.

Other Python libraries to be checked

Opened issue to list here other Python libraries to be checked.

Adding functions to get and add missing bonds

The need of adding two functions to get and add missing bonds from/to a molecular system arises when working with the demo files coming from NGLView is checked: the gro file has no bonds and the pdb bonds are not properly processed.

The way to include or detect covalent bonds should be similar to what is included in pytraj and probably in ParmED (this last library should be checked).

These two functions should be added to the module build.

Removing Untitled.ipynb

There is a file doing nothing named Unititle.ipynb. See #54 (comment)

Exceptions reporting more information

We would like to have exceptions returning more information. For example (see this former version of not_implemented_version_error.py):

The name of the method raising the exception.
The name of the wrong argument
...

For instance, if the user asks for a conversion not implemented, raising an exception with the following message would informative:

 message = (
                f"The conversion from {from_form} to {to_form} in \"{caller_name}\" has not been implemented yet. "
                f"Check {api_doc} for more information. "
                f"Write a new issue in {__github_issues_web__} asking for its implementation."
                )

In order to do this, as @Daniel-Ibarrola addressed in #54 (comment), we need to work properly with the module stack from the library inspect. Example:

def __init__(self, from_form, to_form):

        from molsysmt import __github_issues_web__
        from inspect import stack

        all_stack_frames = stack()
        caller_stack_frame = all_stack_frames[1]
        caller_name = caller_stack_frame[3]

        api_doc = ''

        message = (
                f"The conversion from {from_form} to {to_form} in \"{caller_name}\" has not been implemented yet. "
                f"Check {api_doc} for more information. "
                f"Write a new issue in {__github_issues_web__} asking for its implementation."
                )

        super().__init__(message)

Function to convert mmtf files to mdtraj needs to be reviewed.

The code before PR#54 was buggy (see this conversation and this conversation in the PR mentioned).

This file needs a second round. Does MDTraj have its own function already implemented?

Change the units library

The library to work with units should probably be changed from simtk.unit to pint or unyt. See issue openmm/openmm#2966

There is a conflict to build the package for Linux 64 and Python 3.8

The conflict is produced by pytraj for Python3.8. We have to be attentive to a new pytraj released version.

Comparing a selection (taking the value of a numpy ndarray) with 'all'

Along the code there are multiple times comparing selection with 'all'. Lines such as:

    if indices is None:
        if selection != 'all':
            indices = select(molecular_system, element=element, selection=selection,
                    syntaxis=syntaxis, check=False)
        else:
            indices = 'all'

But this code gives the following warning if selection is a numpy ndarray:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

Is this a potential source of errors in the future?

Decorators should be included in the code to make it more readable

Decorators should be included in the code to work with the digestion of repetitive input arguments in methods such as 'atom_indices', 'frame_indices', 'engine', 'syntaxis', 'form', ...

Is the input argument 'caller' in private exception classes avoidable?

@Daniel-Ibarrola made a great job making sure that exceptions can report useful messages (see #79). To do that, a new input argument 'caller' in the digestion methods was included. After that, these exceptions work as we wanted. 👍🏻

I refuse to think that the problem can not be solved with the library inspect. I am probably wrong. But I would like to keep this issue open to give a second round to the question: is this input argument 'caller' avoidable? 🤔

Do we need a private method to compare selections, indices, and structure_indices with 'all'?

@Daniel-Ibarrola, I am in favor of implementing a method to compare selections, indices, and structure_indices with 'all'.

I can do it later. I just want to know your opinion.

Better way to introduce molecular simulation parameters in methods

Many methods now accept molecular simulation parameters. These parameters are introduced one by one, but they are many, and sometimes arguments depend on other arguments. Dealing with them would be probably easier and safer if they were packed in a single object with a couple of methods to check their sanity.

Importing MolSysMT takes too much

Importing MolSysMT takes too much. This needs to be improved.
We should identify what takes soo much time. Let's for instance time how long it takes to import every external and internal module to identify the problem. I have the impression that, for example, pyunitwizard takes a lot because unyt takes quite a while to be imported.

A tool to work with the whole PDB database

We could wrap the workflow to work with the whole PDB database in to two or three methods.
More info here:
https://tshi.page/ox/bioinformatics/download-and-sync-with-the-entire-pdb-database.html
https://mmtf.rcsb.org/
https://mmtf.rcsb.org/start-with-python.html
https://mmtf.rcsb.org/download.html
https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile

Error converting system 1NCR: "The bioassembly has more than a transformation"

When the mmtf from the pdb id 1NCR is read, the code gives an error because there is more than a single transformation matrix for the bioassembly.

Is there a way to add a color bar to an nglview.NGLWidget?

When the Molsysmt function molsysmt.thirds.nglview.color_by_value is used, having a color bar in the resulting view would be necessary. We should look for possible solutions to do it.

One way to do it can be embedding the NGLWidget in an external widget box with the output of matplotlib (the color bar). Something similar to what is implemented in pychemcurv in the function map_view.

Wrong topology in mdtraj.Topology from nv.datafiles.GRO

Wrong molecules in mdtraj.Topology when:

t = msm.convert(nv.datafiles.GRO, to_form='mdtraj.Topology')

Can digest_box_vectors, digest_box_lengths, digest_box_angles be merged in a single method?

@Daniel-Ibarrola suggested in #54 removing the methods digest_box_lengths and digest_box_angles. The argument was clear, they were doing the same. That's why the changes were accepted. Buy we should analyse if this decision has no other implications. We could, for instance, temporary include digest_box_lengths and digest_box_angles as links to a common method.

In any case, digest_box_vectors should probably be renamed to digest_box? This would also solve the #58

Error viewing system 1TCD

The following sequence of commands throws an error:

import molsysmt as msm
mol_system_2 = msm.convert("1TCD", "molsysmt.MolSys")
mol_system_2 = msm.remove_solvent(mol_system_2, water=True, ions=True)
mol_system_2 = msm.add_missing_hydrogens(mol_system_2, pH=7.4)
msm.view(mol_system_2)

Implementation of filter to import the classes from modules installed only.

MolSysMT loads all classes defined by default. But this shouldn't be the case.
MolSysMT should load only those classes belonging to modules already installed in the environment.
Otherwise, every defined class appears to be a dependency.
This needs to be fixed before submitting a manuscript introducing this tool.

uibcdf / molsysmt Goto Github PK

molsysmt's Introduction

MolSysMT

Installation

Dependencies

Conda

Updating

GitHub

Updating

Documentation

License

Credits

Team

Responsables

Contributors

Citation

Last version DOI:

Cite all versions?

Acknowledgments and Copyright

Copyright

Acknowledgements

molsysmt's People

Contributors

Stargazers

Watchers

Forkers

molsysmt's Issues

Recommend Projects

Recommend Topics

Recommend Org