Code Monkey home page Code Monkey logo

drugbud-suite / dockm8 Goto Github PK

View Code? Open in Web Editor NEW
7.0 0.0 0.0 181.41 MB

All in one Structure-Based Vitual Screening workflow based on the concept of consensus docking.

Home Page: https://drugbud-suite.github.io/dockm8-web/

License: GNU General Public License v3.0

Dockerfile 0.01% Jupyter Notebook 98.79% Python 1.18% Shell 0.03%
cheminformatics cheminformatics-and-compchem compchem computational-chemistry docking drug-design drug-discovery molecular-docking molecular-modeling protein-ligand-interactions

dockm8's People

Contributors

hamzaibrahim21 avatar tonylac77 avatar

Stargazers

 avatar  avatar  avatar  avatar

dockm8's Issues

Allow for active compound retrieval from CHEMBL and other datasets for decoy generation

Collect the known actives (positive controls) for retrospective analysis. For a given target, these can be found in the scientific literature, patent literature or public databases such as IUPHAR/BPS114, ChEMBL115 or ZINC9,99,116, or available in-house.

While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.

Implement Active learning for docking score prediction

Implement one or several of the available active-learning libraries for docking score prediction

Implement as a wrapper around dockM8, first iteration with random compound selection, then run dockM8 workflow.

Scores should be standardized according to the 1st and 99th percentile (or choosable threshold) to ensure consistency across the chemical space of the dataset.

Implement --mode in dockm8.py to handle the active learning mode.

Implement Pytest testing framework

  • Provide test files (.sdf library and .pdb file)
  • Setup pytest to automatically run DockM8 and check produced files
  • Be mindful of computation time on GitHub Actions (may be costly)

Add better reporting after docking run

Generate a single SDF with best hits

  • Docked pose
  • DockM8 score
  • Single scoring functions scores
  • Descriptors

Output a pymol session file for visualisation?

Implement class structure

I've analyzed the Python scripts within the scripts folder of your repository. Here are some suggestions on how you can refactor these scripts into classes suitable for object-oriented programming (OOP):

Clustering Functions (clustering_functions.py):

Create a class Clusterer with methods for different clustering algorithms like kmedoids_clustering and affinity_propagation_clustering. This class can hold common attributes needed for clustering, such as the input DataFrame and any hyperparameters for the algorithms.
Clustering Metrics (clustering_metrics.py):

Consider a class ClusteringMetrics with static methods for each metric calculation like simpleRMSD_calc, spyRMSD_calc, espsim_calc, etc. This approach groups all related metric functions and makes it clear that these functions are closely related and perform similar roles.
Consensus Methods (consensus_methods.py):

A class ConsensusCalculator can encapsulate methods like ECR_best, ECR_avg, avg_ECR, etc. This class can also manage common resources or shared data needed across these methods, improving data encapsulation and reusability.
Docking Functions:

Since the content wasn't provided, I recommend structuring docking-related functions into a class that might be named DockingProcessor. This class could manage docking operations, including preparing ligands and proteins, performing the docking process, and analyzing the results.
DoGSiteScorer Integration (dogsitescorer.py):

A class DoGSiteScorerAPI could encapsulate all functionalities required to interact with the DoGSiteScorer API, including submitting jobs, uploading PDB files, and fetching results. This class would act as an API client dedicated to DoGSiteScorer.
Pocket Detection (get_pocket.py):

A PocketDetector class can encapsulate functionalities for detecting and processing pockets in protein structures, leveraging the methods already defined in the script.
Library Preparation (library_preparation.py):

Consider a class LibraryPreparer that includes methods for molecule standardization, conformer generation, and any preprocessing steps required before docking simulations. This class centralizes the preprocessing steps into a single, reusable component.
Performance Calculation (performance_calculation.py):

A class PerformanceEvaluator could contain methods for calculating various performance metrics for docking or virtual screening results. This class would provide a structured way to assess the effectiveness of the docking process.
Utilities (utilities.py):

While utilities often remain as a collection of standalone functions, consider grouping related utilities into classes if they share common data or are used together frequently.

Use database for handling molecule and avoiding repeat calculations

Use a database to store molecules and results. Each docking and scoring operation can be done on single molecules instead of split SDF files.

Need to be careful of I/O time becoming the bottleneck instead of CPU.
Each molecule can be written as a temp file for input into docking and restoring.

Alternatively, write every molecule/pose to a single SDF file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.