Code Monkey home page Code Monkey logo

bridgeport's Introduction

Bridgeport - A python package for automated preparation of protein simulations.

Overview

The Bridgeport module serves as a comprehensive tool for preparing crystal structures for simulation using OpenMM. It integrates several steps necessary for the preparation of both protein and ligand molecules, ensuring they are ready for molecular dynamics simulation. The process includes alignment to a reference structure, separation of ligand and protein, repair and addition of missing residues and atoms, protonation, solvation, and the generation of a complete OpenMM system ready for simulation. This module streamlines the preparation process, combining functionalities from various tools and libraries such as MDAnalysis, mdtraj, OpenMM, PDBFixer, and RDKit, into a singular, cohesive workflow.

alt text

Environment

Construct the conda environment with the following command

conda env create -f conda_setup.yml

BridgePort Usage

Bridgeport can be easily run in 2 lines of code:

BP = Bridgeport(input_json='Bridgeport_input.json')
BP.run()

Input .json files

All the input files and parameters are specifid in the Bridgeprot_input.json:

First the working directory must be specified:

  • "working_dir": specifies where all output and intermediate directories will be created.

"ligand"

Ligands can be generated using an experimental structure. If the ligand is a small molecule, select the resname in the input.pdb with "lig_resname". If the ligand is a peptide, select the chainid in the input.pdb with "peptide_chainid".

Bridgeprot is also capable of generating ligand analogues based on a known ligand-protein complex. This workflow consists of forcing the maximum common substructure (MCS) between the analogue and the known structure to have matching torsions and the MCS of the analogue is aligned to that of the known ligand. Then, then a specified number of conformations of functional groups unique to the analogue are generated that maintain a certain RMSD threshold of the MCS and any other specified atoms that are matching in the known ligand and analogue. Then, each conformation is prepared in complex with the receptor and minimized to avoid steric clashes between the receptor and analogue. The lowest potential energy conformation, after minimization, becomes the final conformation of the analogue.

If you want to generate an analogue of a known structure: Make sure lig_resname and peptide_chain are set to false. Provide the SMILES string for you analogue with the "analogue_smiles" key. Provide the name of your analogue with "analogue_name". Provide a path to the known structure with "known_structure" that contains the experimental ligand to align analogue to with a "known_resname" or "known_chainid".

  • "lig_resname": specifies the ligand resname in the input.pdb file (Use if small molecule). If the ligand is a peptide choose "false". Ligand resnames that start with and number are not easily recognized by MDAnalysis (which is used for parsing), so we recommend changing the ligand resname in that case.
  • "peptide_chain": If ligand is a peptide, specify the letter code that denotes the ligand, if not choose "false".
  • "peptide_fasta": If ligand is a peptide, specify the path to the .fasta file to repair the peptide if desired. If no repair is desired, choose "false", or remove argument.
  • "known_smiles": Mandatory entry of smiles string of the ligand (either small molecule or peptide) in the input.pdb file
  • "peptide_nonstandard_resids": If non-standard residues are present in the peptide, present the indicies of the non-standard residues in ascending order, 0-indexed. This step is only necessary if a .fasta file is provided as well.
  • "analogue_smiles": String of smiles that represent the analogue to generate. Optional argument.
  • "analogue_name": Name to generate new files with.
  • "known_structure": Path to pdb file that contains the known ligand to align analogue to. This should be the same path as "protein" "input_pdb" to start, Bridgeport will correct automatically later.
  • "known_resname": Resname of ligand to parse in known_structure, if known ligand is a small molecule.
  • "known_chainid": Chainid of ligand to parse in known_structure, if known ligand is a peptide.
  • "add_known_atoms": List of atom names that are also in analogue, but not recognized in the MCS. These atoms are used to compare the RMSD during conformer generation.
  • "add_known_resids": List of resids that match each atom in "add_known_atoms". This should always be specified, but is especially helpful in cases where the known ligand is a peptide and each residue may have the same atoms names (ex: CA, N, C, O, etc.)
  • "remove_analogue_atoms": List of atom names that are found in the MCS, but are unwanted in the MCS due to user discretion.
  • "add_analogue_atoms": List of atoms names that match each atom in "add_known_atoms". For example if "add_known_atoms" is ['C21', 'C23', 'N5'] than add_analogue_atoms should be a list of the equivalent atoms in the analogue structure, such as ['C22', 'C24', 'N5']. It may be helpful to run Bridgeport without specifying any of the "add" arguments to generate the .pdb file of the ligand to better understand the atom names that will be generated from the provided "analogue_smiles".
  • "analogue_rmsd_thres"*: RMSD threshold that analogue conformer generation cannot exceed. RMSD is calculated of MCS and other specified atoms in "add_known_atoms", "add_known_resids", and "add_analogue_atoms"
  • "small_molecule_params": If analogue is a peptide, setting this option to true will choose to parameterize it as a small molecule with Smirnoff from OpenFF.
  • "align_all": If true, use both atoms identified in MCS and atoms specified in "add_known_atoms", "add_analogue_atoms" for alignment. Default is false, which won't use additional atoms specified in "add_known_atoms", "add_analogue_atoms" for alignment, only using them for assigning matching torsions.

"protein"

  • "input_pdb_dir": Path to directory where the input .pdb can be found.

  • "input_pdb": Name of .pdb file to use as an input structure.

  • "chains": One letter code of chain to parse from file. There is currently not an option to select multiple chains.

  • "RepairProtein" There is currently not an option to skip protein repair.

    • "working_dir": Path to directory to put all the modeller intermediates.
    • "fasta_path": Path to .fasta file to repair the input protein structure.
    • "tails": List of indices to parse the extra tails. EX: [30, 479]
    • "loops": 2-D List of indices that specify lower and upper bounds of loops to optimize during refinement. Loop optimization can take a while, but if skipped, unbonded output structures will result.
    • "engineered_resids": List of resids that are known engineered mutations in the crystal pdb. Adding this argument may prevent sequence errors in the RepairProtein section of Bridgeport.
    • "secondary_template": Path to secondary .pdb to use as a reference to accurately model large portions that are missing in the input .pdb structure.

"environment"

  • "membrane": If membrane should be specified choose "true", and make sure that "alignment_ref" argument is the appropriate OPM structure. Default is false.
  • "pH": Specify the pH. Default is 7.0.
  • "ion_strength": Specify the concentration of NaCl ions (in Molar). Default is 0.15 M.
  • "alignment_structure": Path to structure to align the final system to.
  • "alignment_chains": List of chains in the alignment structure to use for alignment. EX: ["A"]

Examples

7vvk (membrane, peptide ligand)

This example uses the PDBID 7vvk as the input structure and repairs both the protein and the peptide ligand.

8jr9 (membrane, small-molecule ligand, secondary template)

This example uses the PDBID 8jr9 as the input structure and repairs the protein with the secondary template of the repaired 7vvk.pdb structure to properly model the extracellular domain. This example uses a small molecule ligand.

5zty analogue (membrane, analogue, small-molecule ligand)

This example uses the protein and ligand from the PDBID 5zty but creates the final ligand from a smiles string and alignes maximum common substructure to ligand in 5zty.

Leu-Enkphalin (membrane, small-peptide ligand that is an analogue of Endomorphin-1 (8f7r))

This example showcases how to use the small molecule alignment, including specifying extra atoms to match that are not recognized as the maximum common substructure from rdkit.

bridgeport's People

Contributors

davidcooper7 avatar josephdepaoloboisvert avatar annachiw avatar

Watchers

David Minh avatar  avatar  avatar

bridgeport's Issues

Add Apo Functionality

Currently there must be some definition of what the ligand is. There should be a way to parameterize single or multi chain apo proteins.

Symmetry Issues

The alignment of analogues to references may be improved by testing different symmetries of aligned groups (i.e. benzene rings with different substitutions).

Parameterizing Peptide Ligands with OpenFF

Parameterizing the entire peptide ligand with OpenFF takes far too long.
The next workaround to try is to parameterize only the nonstandard Amino Acid with Openff, and the rest with AMBERFF14SB

Allow_Undefined_Stereo

work the allow_undefined_stereo keyword argument into the main method of ForceFieldHandler, so that it can be argumentatively provided to rdkit/openff

Restraints to Ligand in Step1

In step 1 of the MotorRow equilibration protocol, add restraints to the ligand (for analogues) so that the heavy atoms of the ligand which are part of the maximum common substructure are restrained to their initial positions as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.