Code Monkey home page Code Monkey logo

drugbud-suite / dockm8 Goto Github PK

View Code? Open in Web Editor NEW
24.0 1.0 3.0 227.01 MB

All in one Structure-Based Virtual Screening workflow based on the concept of consensus docking.

Home Page: https://drugbud-suite.github.io/dockm8-web/

License: GNU General Public License v3.0

Dockerfile 0.01% Jupyter Notebook 98.84% Python 1.13% Shell 0.03%
cheminformatics cheminformatics-and-compchem compchem computational-chemistry docking drug-design drug-discovery molecular-docking molecular-modeling protein-ligand-interactions

dockm8's People

Contributors

hamzaibrahim21 avatar tonylac77 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

dockm8's Issues

Use database for handling molecule and avoiding repeat calculations

Use a database to store molecules and results. Each docking and scoring operation can be done on single molecules instead of split SDF files.

Need to be careful of I/O time becoming the bottleneck instead of CPU.
Each molecule can be written as a temp file for input into docking and restoring.

Alternatively, write every molecule/pose to a single SDF file.

Implement New scoring functions

  • GenScore: A scoring framework for predicting protein-ligand binding affinity.
  • ITScoreAff: iterative knowledge-based scoring function for protein-ligand interactions by considering binding affinity information
  • DLIGAND2
  • CENsible
  • PANTHER
  • ConBAP
  • HACNet
  • GBScore

Implement class structure

I've analyzed the Python scripts within the scripts folder of your repository. Here are some suggestions on how you can refactor these scripts into classes suitable for object-oriented programming (OOP):

Clustering Functions (clustering_functions.py):

Create a class Clusterer with methods for different clustering algorithms like kmedoids_clustering and affinity_propagation_clustering. This class can hold common attributes needed for clustering, such as the input DataFrame and any hyperparameters for the algorithms.
Clustering Metrics (clustering_metrics.py):

Consider a class ClusteringMetrics with static methods for each metric calculation like simpleRMSD_calc, spyRMSD_calc, espsim_calc, etc. This approach groups all related metric functions and makes it clear that these functions are closely related and perform similar roles.
Consensus Methods (consensus_methods.py):

A class ConsensusCalculator can encapsulate methods like ECR_best, ECR_avg, avg_ECR, etc. This class can also manage common resources or shared data needed across these methods, improving data encapsulation and reusability.
Docking Functions:

Since the content wasn't provided, I recommend structuring docking-related functions into a class that might be named DockingProcessor. This class could manage docking operations, including preparing ligands and proteins, performing the docking process, and analyzing the results.
DoGSiteScorer Integration (dogsitescorer.py):

A class DoGSiteScorerAPI could encapsulate all functionalities required to interact with the DoGSiteScorer API, including submitting jobs, uploading PDB files, and fetching results. This class would act as an API client dedicated to DoGSiteScorer.
Pocket Detection (get_pocket.py):

A PocketDetector class can encapsulate functionalities for detecting and processing pockets in protein structures, leveraging the methods already defined in the script.
Library Preparation (library_preparation.py):

Consider a class LibraryPreparer that includes methods for molecule standardization, conformer generation, and any preprocessing steps required before docking simulations. This class centralizes the preprocessing steps into a single, reusable component.
Performance Calculation (performance_calculation.py):

A class PerformanceEvaluator could contain methods for calculating various performance metrics for docking or virtual screening results. This class would provide a structured way to assess the effectiveness of the docking process.
Utilities (utilities.py):

While utilities often remain as a collection of standalone functions, consider grouping related utilities into classes if they share common data or are used together frequently.

Implement Active learning for docking score prediction

Implement one or several of the available active-learning libraries for docking score prediction

Implement as a wrapper around dockM8, first iteration with random compound selection, then run dockM8 workflow.

Scores should be standardized according to the 1st and 99th percentile (or choosable threshold) to ensure consistency across the chemical space of the dataset.

Implement --mode in dockm8.py to handle the active learning mode.

Implement Pytest testing framework

  • Provide test files (.sdf library and .pdb file)
  • organise test files better
  • Sub sections
    • docking_postprocessing
    • library_preparation
    • pocket_finding
    • protein_preparation
    • config parsing
    • docking
    • clustering functions
    • clustering metrics
    • consensus
    • scoring functions
    • performance / postprocessing
  • Setup pytest to automatically run DockM8 and check produced files
    Be mindful of computation time on GitHub Actions (may be costly)

Error while running dockm8.ipynb

if prepare_protein == True:
prepared_receptor = prepare_protein_protoss(receptor)
else:
prepared_receptor = receptor

#Create a temporary folder for all further calculations
w_dir = prepared_receptor.parent / prepared_receptor.stem
print('The working directory has been set to:', w_dir)
(w_dir).mkdir(exist_ok=True)

if pocket == 'Reference':
pocket_definition = get_pocket(ref_file, prepared_receptor, 10)
print(pocket_definition)
if pocket == 'RoG':
pocket_definition = get_pocket_RoG(ref_file, prepared_receptor)
print(pocket_definition)
elif pocket == 'Dogsitescorer':
pocket_definition = binding_site_coordinates_dogsitescorer(prepared_receptor, w_dir, method='Volume')

This code gave the following error.

{
"name": "JSONDecodeError",
"message": "Expecting value: line 1 column 1 (char 0)",
"stack": "---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File ~/opt/anaconda3/envs/dockm8/lib/python3.10/site-packages/requests/models.py:974, in Response.json(self, **kwargs)
973 try:
--> 974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/init.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of s (a str instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError Traceback (most recent call last)
Cell In[4], line 2
1 if prepare_protein == True:
----> 2 prepared_receptor = prepare_protein_protoss(receptor)
3 else:
4 prepared_receptor = receptor

File ~/Desktop/tik/DockM8/scripts/protein_preparation.py:81, in prepare_protein_protoss(receptor)
79 query = {'protein_file': upload_file}
80 # Submit the job to ProtoSS and get the job submission response
---> 81 job_submission = requests.post(PROTOSS, files=query).json()
83 # Poll the job status until it is completed
84 protoss_job = poll_job(job_submission['job_id'], PROTOSS_JOBS)

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/site-packages/requests/models.py:978, in Response.json(self, **kwargs)
974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 978 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
}

Create multipage streamlit app

Add pages for

  • Welcome
  • Library Analysis and Filtering
  • Library Preparation
  • Protein Fetching and Analysis
  • Protein Preparation
  • Binding Site Detection
  • Docking
  • Docking Postprocessing
  • Pose Selection
  • Rescoring
  • Consensus
  • DockM8 reporting

Also add a Guided Moded vs Expert Mode

Allow for active compound retrieval from CHEMBL and other datasets for decoy generation

Collect the known actives (positive controls) for retrospective analysis. For a given target, these can be found in the scientific literature, patent literature or public databases such as IUPHAR/BPS114, ChEMBL115 or ZINC9,99,116, or available in-house.

While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.

Fix library preparation missing compounds

Some compounds are dropped during either protonation or conf generation, investigate and generate conformers alternatively if one of the two methods fail
Ensure prepared library is the same length as input

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.