Code Monkey home page Code Monkey logo

molsysmt's Introduction

MolSysMT

License: MIT DOI Documentation CI codecov Install with conda Installation on ubuntu-latest Installation on macos-latest

Installation | Documentation | License | Credits | Team

Molecular Systems Multi-Tool

This library was thought as a humble frontend to make the life of a computational molecular biology lab, the UIBCDF, easier. MolSysMT is design to cover specific needs, or to speed up workflows, when you are working with tools such as:

  • MDTraj
  • MDAnalysis
  • PDBFix
  • OpenMM
  • Yank
  • HTMD
  • PyEmma
  • ParmEd
  • NGLview
  • pdbtools?

Although MolSysMT was not concived to do what other tools do better, this toolkit can be used alone to do few simple tasks.

All credit should be given to the developers and mantainers of these former packages and the libraries they depend on.

Molecular Systems:

  • Aqui deberia de ir todo el tema de la creacion del sistema molecular junto con parametros y topologia
  • Deberia tambien estar esto preparado para trabajar con moleculas como ligandos.

poner ParmEd y OpenBabel: parmed.github.io http://openbabel.org/wiki/Main_Page

pdbfixer

Molecular Dynamics:

Installation

Dependencies

-Fortran Compiler (gfortran or intel fortran compiler) -Lapack ('conda install lapack' would work?)

Other python packages as those mentioned here(link to section) and included in this list(file).

Conda

Updating

GitHub

git clone [email protected]:UIBCDF/MolSysMT.git
cd MolSysMT
python setup.py develop
pip uninstall molsysmt

Updating

To be written

Documentation

http://www.uibcdf.org/MolSysMT/

License

Credits

All credit should be given to the developers and mantainers of the following tools and dependencies:

...

Team

Responsables

Diego Prada Gracia
Liliana M. Moreno Vargas

Contributors

...

Citation

Last version DOI:

Cite the last version with the following DOI provided by Zenodo:

DOI

Cite all versions?

You can cite all versions by using the following DOI. This DOI represents all versions, and will always resolve to the latest one:

DOI

Acknowledgments and Copyright

Copyright

Copyright (c) 2021, UIBCDF

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

molsysmt's People

Contributors

daniel-ibarrola avatar dprada avatar lmmv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

molsysmt's Issues

Some digestion methods need to be implemented

Some digestion methods were included as empty functions. They need to be implemented:

  • digest_box
  • digest_comparison
  • digest_coordinates

And some needs to completed or adjusted:

  • digest_engine
  • digest_selection

Hbonds output formats

What would be the best way to encode the list of hydrogen bonds found in a trajectory?

  • A dictionary of lists (with frame indices where the bond was formed)
  • A sparse tensor (3d with the library "Sparse")
  • An array of arrays with different shape (number of frames x lists of detected hbonds)
  • A list of sparse matrices (as mdtraj)
  • A native Hbond class where the info is stored as a sparse tensor with methods to be able to access quickly to different formats.
  • ...

This is not clear now.

Proteins, small molecules and entities names

MolSysMT has to give a name to proteins, small molecules, and entities. These names can be extracted from an mmtf file, but what happens with other forms? We probably need to implement in Sabueso the tools to find these names together with other attributes from files and databases.

Do we need a MolecularSystemNeeded error?

In @Daniel-Ibarrola's PR the following code in basic/set.py:

    if check:

        if not is_molecular_system(molecular_system):
            raise MolecularSystemNeeded()

was replaced by:

    if check:

        if not is_molecular_system(molecular_system):
            raise TypeError("A molecular system is needed.")

I gues that at this point, the version of current process should be something as:

    if check:
        digest_single_molecular_system(molecular_system)

And it is inside the digestion where the error should be raised. This needs to be reviewed. Do we already have a MolecularSystemNeeded error to be raised in this situation?

A method is needed to create biological assemblies.

In the protein data bank, a same structure can have more than a molecular system. Some structures have biological assemblies proposed (#32). In this case, the pdb or mmtf file have a unit molecular system and a list of geometrical transformations to obtain the assemblies. A possible way to handle this situation is the following:

  • The mmtf or pdb is converted always without applying any geometrical transformation.
  • A warning message is printed out if the PDB entry has more than a molecular system (there are geometrical transformations to create biological assemblies).
  • A method has to be included in MolSysMT with the capability to extract these geometrical transformations from mmtf files, pdb files or PDB ids, producing a new molecular system representing a particular biological assembly.

Are former auxiliary functions in private module `lists_and_tuples` necessary?

The following functions in a former version of _private/lists_and_tuples (see the file are probably called by code waiting to be re-implemented. This needs to be checked to confirm their removal.

def list_to_csv_string(obj):

    return ",".join([str(ii) for ii in obj])

def list_to_ssv_string(obj):

    return " ".join([str(ii) for ii in obj])

def are_equal_sets(objects1, objects2):

    if not is_list_or_tuple(objects1):
        objects1=[objects1]
    if not is_list_or_tuple(objects2):
        objects2=[objects2]

    output = False
    if set(objects1)==set(objects2):
        output = True

    return output

Can the omnipresent argument "check" be avoided?

Having the argument "check" everywhere in the code is annoying. Necessary at this point of the development, but probably avoidable.

For example. There could be a short way to check the validity of a set of input arguments in a function, only if they were given by an invocation out of the library's code (a script, a jupyter session, etc). Can this be implemented in a simple way?

We should also consider the following situation. If checking input arguments in functions is computationally expensive, should we give the option to the user not to do it?

New class iterator in the module `structure` to work with trajectories.

In my opinion, the only important functionality left to implement before publishing this is something to work with trajectories. We could probably discuss here how this tool could be shaped.

I would like to avoid additional auxiliary classes as most of the libraries have (mdtraj or MDAnalysis). In my opinion the power of the tool should be in a unique place, a iterator.

ΒΏWhat do we need? We need a tool to extract frames (step, time, coordinates, and boxes) in a smart and simple way when these structures are stored in a trajectory file. However, their use should not be restricted to trajectory files.

As user, I would like to do something like this:

import molsysmt as msm

iterator = msm.structure.iterator('traj.h5', start=100, interval=10, stop=200, selection='atom_name=="CA"')

for step, time, coordinates, box in iterator:
     print(step, time, coordinates, box)

iterator.close()

or

import molsysmt as msm

iterator = msm.structure.iterator(['traj.gro', 'traj.xtc'], indices=[10, 22, 30, 78, 145], selection=[10, 11, 12])

for step, time, coordinates, box in iterator:
     print(step, time, coordinates, box)

iterator.close()

or

import molsysmt as msm

iterator = msm.structure.iterator('traj.dcd', start=1, interval=10, stop=1000, chunk_size=20)

for _, chunk_times, chunk_coordinates, _ in iterator:
     print(chunk_times, chunk_coordinates)

iterator.close()

This iterator could have the following instatation arguments:

  • molecular_system: The standard molecular system input.
  • start: first structure index of the trajectory to start with. [default: 0]
  • interval: number of structure indices to be skipt in each iteration. [default: 1]
  • stop: the iteration finishes if the current structure index is larger than or equal to this integer. [default: None]
  • chunk_size: amount of structures in the output of each iteration [default: 1]
  • selection: atoms selection [default: 'all']
  • syntaxis: syntaxis used in selection [default: 'MolSysMT']

I am not sure if to do this iterator method, we need to define a specific structures_iterator in the module item for each form (trajectory file or form with structures in general).

@Daniel-Ibarrola: What do you think? Do you have an alternative proposal?

Error when importing MolModMT in Jupyter-lab

When I imported MolModMT in jupyter-lab I got the following error:

import os
import mdtraj as md
from pdbfixer import PDBFixer
import pdbfixer as pdb
from simtk.openmm.app import PDBFile
import molmodmt as mt

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in find_gaff_dat()
    272         try:
--> 273             AMBERHOME = os.path.split(full_path)[0]
    274             AMBERHOME = os.path.join(AMBERHOME, "../")

~/miniconda3/envs/AnalysisMD/lib/python3.6/posixpath.py in split(p)
    106     everything after the final slash.  Either part may be empty."""
--> 107     p = os.fspath(p)
    108     sep = _get_sep(p)

TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-db124cf99dc6> in <module>
      4 import pdbfixer as pdb
      5 from simtk.openmm.app import PDBFile
----> 6 import molmodmt as mt

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/__init__.py in <module>
----> 1 from .multitool import *
      2 from . import utils
      3 from . import moldyn as moldyn
      4 from . import molsys as molsys
      5 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/multitool.py in <module>
     10 
     11 ## Classes
---> 12 from .formats.classes import dict_is_form as _dict_classes_is_form, \
     13     list_forms as _list_classes_forms, \
     14     dict_converter as _dict_classes_converter, \

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/__init__.py in <module>
----> 1 from .base import *

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/base.py in <module>
     18 
     19 for api_form in list_api_forms:
---> 20     module_api_form=_import_module('.'+api_form,base_package)
     21     form_name=module_api_form.form_name
     22     list_forms.append(form_name)

~/miniconda3/envs/AnalysisMD/lib/python3.6/importlib/__init__.py in import_module(name, package)
    124                 break
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 
    128 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/molmodmt/formats/classes/api_yank_Topography.py in <module>
      1 from os.path import basename as _basename
----> 2 from yank import Topography as _yank_Topography
      3 
      4 form_name=_basename(__file__).split('.')[0].replace('api_','').replace('_','.')
      5 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/__init__.py in <module>
     12 from . import utils
     13 from . import multistate
---> 14 from . import restraints
     15 from . import pipeline
     16 from . import experiment

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/restraints.py in <module>
     35 from simtk import openmm, unit
     36 
---> 37 from . import pipeline
     38 from .utils import methoddispatch, generate_development_feature
     39 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/yank/pipeline.py in <module>
     29 import numpy as np
     30 import openmmtools as mmtools
---> 31 import openmoltools as moltools
     32 from pdbfixer import PDBFixer
     33 from simtk import openmm, unit

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/__init__.py in <module>
     24 """
     25 
---> 26 from openmoltools import amber_parser, system_checker, utils, packmol, openeye, amber, cirpy, gromacs, schrodinger

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/openeye.py in <module>
      3 import mdtraj as md
      4 from openmoltools.utils import import_, enter_temp_directory, create_ffxml_file
----> 5 from openmoltools.amber import run_antechamber
      6 import logging
      7 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in <module>
    281     return os.path.join(AMBERHOME, 'dat', 'leap', 'parm', 'gaff.dat')
    282 
--> 283 GAFF_DAT_FILENAME = find_gaff_dat()
    284 
    285 

~/miniconda3/envs/AnalysisMD/lib/python3.6/site-packages/openmoltools/amber.py in find_gaff_dat()
    274             AMBERHOME = os.path.join(AMBERHOME, "../")
    275         except:
--> 276             raise(ValueError("Cannot find AMBER GAFF"))
    277 
    278     if AMBERHOME is None:

ValueError: Cannot find AMBER GAFF

It looks like is something related to the AMBER HOME path. Should I install Amber to fix the problem. I am managing MolModMT via Conda.

Include rdkit molecular forms

Including rdkit forms is necessary to use MolSysMT in OpenPharmacophore. The classes to be included are:

  • rdkit.Chem.rdchem.Mol

Documentation restructuration

The structure of the documentation needs to be cleaned and reorganized.
This needs to be urgent since other members of the lab need to start using MolSysMT.

There are faster libraries to make queries in a DataFrame

There are faster libraries than Pandas. The concept of having the topology and the bonds as DataFrames is useful since we can, at any time, try other libraries such as Vaex, PySpark, Dask, ..., which seems to be more efficient than Pandas. We have to implement the fastest one doing queries in large systems. This is one of the strongest points of MolSysMT, the possibility to be the fastest library loading molecular systems and doing atoms selections (in small and very large systems); together with the fact that the topology is in a standard type of object used by many libraries.

basic.contains() function should work on attributes and selections

The method 'contains' should be refactored to work on attributes and selections.

For example, the following tthree commands should be equal:

msm.contains(molecular_system, waters=True)

msm.contains(molecular_system, n_waters=True)

msm.contains(molecular_system, selection='molecule_type=="water"')

A molecular system can be a set of items, not only one.

A molecular system can be defined by a set of items, not only one.
This means that all methods should be able to work with a list of items, not just one.
This way, the possibility to have an item for the topology, a second item for the coordinates, and a third item for the box, i.e., requires auxiliary methods to deal with it.
In addition, this has to be documented as something implicit in the philosophy of MolSysMT.

Adding functions to get and add missing bonds

The need of adding two functions to get and add missing bonds from/to a molecular system arises when working with the demo files coming from NGLView is checked: the gro file has no bonds and the pdb bonds are not properly processed.

The way to include or detect covalent bonds should be similar to what is included in pytraj and probably in ParmED (this last library should be checked).

These two functions should be added to the module build.

Exceptions reporting more information

We would like to have exceptions returning more information. For example (see this former version of not_implemented_version_error.py):

  • The name of the method raising the exception.
  • The name of the wrong argument
  • ...

For instance, if the user asks for a conversion not implemented, raising an exception with the following message would informative:

 message = (
                f"The conversion from {from_form} to {to_form} in \"{caller_name}\" has not been implemented yet. "
                f"Check {api_doc} for more information. "
                f"Write a new issue in {__github_issues_web__} asking for its implementation."
                )

In order to do this, as @Daniel-Ibarrola addressed in #54 (comment), we need to work properly with the module stack from the library inspect. Example:

def __init__(self, from_form, to_form):

        from molsysmt import __github_issues_web__
        from inspect import stack

        all_stack_frames = stack()
        caller_stack_frame = all_stack_frames[1]
        caller_name = caller_stack_frame[3]

        api_doc = ''

        message = (
                f"The conversion from {from_form} to {to_form} in \"{caller_name}\" has not been implemented yet. "
                f"Check {api_doc} for more information. "
                f"Write a new issue in {__github_issues_web__} asking for its implementation."
                )

        super().__init__(message)

Comparing a selection (taking the value of a numpy ndarray) with 'all'

Along the code there are multiple times comparing selection with 'all'. Lines such as:

    if indices is None:
        if selection != 'all':
            indices = select(molecular_system, element=element, selection=selection,
                    syntaxis=syntaxis, check=False)
        else:
            indices = 'all'

But this code gives the following warning if selection is a numpy ndarray:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison

Is this a potential source of errors in the future?

Is the input argument 'caller' in private exception classes avoidable?

@Daniel-Ibarrola made a great job making sure that exceptions can report useful messages (see #79). To do that, a new input argument 'caller' in the digestion methods was included. After that, these exceptions work as we wanted. πŸ‘πŸ»

I refuse to think that the problem can not be solved with the library inspect. I am probably wrong. But I would like to keep this issue open to give a second round to the question: is this input argument 'caller' avoidable? πŸ€”

Better way to introduce molecular simulation parameters in methods

Many methods now accept molecular simulation parameters. These parameters are introduced one by one, but they are many, and sometimes arguments depend on other arguments. Dealing with them would be probably easier and safer if they were packed in a single object with a couple of methods to check their sanity.

Importing MolSysMT takes too much

Importing MolSysMT takes too much. This needs to be improved.
We should identify what takes soo much time. Let's for instance time how long it takes to import every external and internal module to identify the problem. I have the impression that, for example, pyunitwizard takes a lot because unyt takes quite a while to be imported.

Is there a way to add a color bar to an nglview.NGLWidget?

When the Molsysmt function molsysmt.thirds.nglview.color_by_value is used, having a color bar in the resulting view would be necessary. We should look for possible solutions to do it.

One way to do it can be embedding the NGLWidget in an external widget box with the output of matplotlib (the color bar). Something similar to what is implemented in pychemcurv in the function map_view.

Can digest_box_vectors, digest_box_lengths, digest_box_angles be merged in a single method?

@Daniel-Ibarrola suggested in #54 removing the methods digest_box_lengths and digest_box_angles. The argument was clear, they were doing the same. That's why the changes were accepted. Buy we should analyse if this decision has no other implications. We could, for instance, temporary include digest_box_lengths and digest_box_angles as links to a common method.

In any case, digest_box_vectors should probably be renamed to digest_box? This would also solve the #58

Error viewing system 1TCD

The following sequence of commands throws an error:

import molsysmt as msm
mol_system_2 = msm.convert("1TCD", "molsysmt.MolSys")
mol_system_2 = msm.remove_solvent(mol_system_2, water=True, ions=True)
mol_system_2 = msm.add_missing_hydrogens(mol_system_2, pH=7.4)
msm.view(mol_system_2)

Implementation of filter to import the classes from modules installed only.

MolSysMT loads all classes defined by default. But this shouldn't be the case.
MolSysMT should load only those classes belonging to modules already installed in the environment.
Otherwise, every defined class appears to be a dependency.
This needs to be fixed before submitting a manuscript introducing this tool.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.