The dockm8 from drugbud-suite

Use database for handling molecule and avoiding repeat calculations

Use a database to store molecules and results. Each docking and scoring operation can be done on single molecules instead of split SDF files.

Need to be careful of I/O time becoming the bottleneck instead of CPU.
Each molecule can be written as a temp file for input into docking and restoring.

Alternatively, write every molecule/pose to a single SDF file.

Add ability to perform LBVS before docking

Pharmacophore generation and filtering of library
- Automatic PH4 generation from crystal structure
  - GRAIL
  - apo2ph4
- PH4 generation from library of compounds
Fingerprint similarity to given library of actives
Shape Based screening (https://cdpkit.org/master/applications/shapescreen.html)

Split docking scripts so that each program has it's own script

Add automatic protein structure retrieval using UniProt ID and PDBminer

PDBminer github

Allow for input of a PDB ID or a UNIprot ID and DockM8 to search for the ideal structure to use

Add faster docking methods

https://github.com/dptech-corp/Uni-Dock
https://github.com/schrojunzhang/KarmaDock
https://github.com/molecularmodelinglab/plantain
FABind+
PANTHER
rdock : via bioconda rdock then https://rdock.github.io/how-to-run-rdock-in-parallel/

Improve docking pose post processing and filtering

Use config file instead off a long list of arguments for dockm8.py

gui.py must modify the config file

Allow library property prefiltering before docking

Add the ability to filter the library before running docking predictions.
MW, n_heavy_atoms, n_rotatable_bonds, etc.
PAINS
https://chemfh.scbdd.com/apis/

Reorganise library preparation scripts into a folder

Implement Python logging instead of custom printlog function

Python already has a logging function

replace all printlog calls with python logging handling
Make sure log file is cleared at the start of every dockm8 run

Fix docking generating incorrect number of poses

For test file 1fvv, docking generates too little poses, investigate

Implement New scoring functions

GenScore: A scoring framework for predicting protein-ligand binding affinity.
ITScoreAff: iterative knowledge-based scoring function for protein-ligand interactions by considering binding affinity information
DLIGAND2
CENsible
PANTHER
ConBAP
HACNet
GBScore

Add better reporting after docking run

Generate a single SDF with best hits

Docked pose
DockM8 score
Single scoring functions scores
Descriptors
For ensemble docking use : https://durrantlab.pitt.edu/enopt/
Output a pymol session file for visualisation?

Improve conformer generation

CDPKit single conformer
CDPKit conformer ensembles
Other methods : [https://drugbud-suite.github.io/CADD_Vault/Cheminformatics/Conformer%20Generation/]

I've analyzed the Python scripts within the scripts folder of your repository. Here are some suggestions on how you can refactor these scripts into classes suitable for object-oriented programming (OOP):

Clustering Functions (clustering_functions.py):

Create a class Clusterer with methods for different clustering algorithms like kmedoids_clustering and affinity_propagation_clustering. This class can hold common attributes needed for clustering, such as the input DataFrame and any hyperparameters for the algorithms.
Clustering Metrics (clustering_metrics.py):

Consider a class ClusteringMetrics with static methods for each metric calculation like simpleRMSD_calc, spyRMSD_calc, espsim_calc, etc. This approach groups all related metric functions and makes it clear that these functions are closely related and perform similar roles.
Consensus Methods (consensus_methods.py):

A class ConsensusCalculator can encapsulate methods like ECR_best, ECR_avg, avg_ECR, etc. This class can also manage common resources or shared data needed across these methods, improving data encapsulation and reusability.
Docking Functions:

Since the content wasn't provided, I recommend structuring docking-related functions into a class that might be named DockingProcessor. This class could manage docking operations, including preparing ligands and proteins, performing the docking process, and analyzing the results.
DoGSiteScorer Integration (dogsitescorer.py):

A class DoGSiteScorerAPI could encapsulate all functionalities required to interact with the DoGSiteScorer API, including submitting jobs, uploading PDB files, and fetching results. This class would act as an API client dedicated to DoGSiteScorer.
Pocket Detection (get_pocket.py):

A PocketDetector class can encapsulate functionalities for detecting and processing pockets in protein structures, leveraging the methods already defined in the script.
Library Preparation (library_preparation.py):

Consider a class LibraryPreparer that includes methods for molecule standardization, conformer generation, and any preprocessing steps required before docking simulations. This class centralizes the preprocessing steps into a single, reusable component.
Performance Calculation (performance_calculation.py):

A class PerformanceEvaluator could contain methods for calculating various performance metrics for docking or virtual screening results. This class would provide a structured way to assess the effectiveness of the docking process.
Utilities (utilities.py):

While utilities often remain as a collection of standalone functions, consider grouping related utilities into classes if they share common data or are used together frequently.

Reorganise Scoring Functions with classes for each function

Create conda package

Check if links are updated

@Tonylac77 maybe double check a few links, I was just clicking on the automatic set-up link and it forwards me to the gitlab page ;)
https://gitlab.com/Tonylac77/DockM8/-/blob/main/setup_py310.sh

Split pocket finding scripts and add pocket_finding folder

Split pocket finding scripts and add pocket_finding folder, for clarity and future conversion to classes

Implement Active learning for docking score prediction

Implement one or several of the available active-learning libraries for docking score prediction

HASTEN (https://github.com/TuomoKalliokoski/HASTEN)
DeepDocking (https://github.com/jamesgleave/DD_protocol)
MolPal (https://github.com/coleygroup/molpal)

Implement as a wrapper around dockM8, first iteration with random compound selection, then run dockM8 workflow.

Scores should be standardized according to the 1st and 99th percentile (or choosable threshold) to ensure consistency across the chemical space of the dataset.

Implement --mode in dockm8.py to handle the active learning mode.

Implement Pytest testing framework

Error while running dockm8.ipynb

if prepare_protein == True:
prepared_receptor = prepare_protein_protoss(receptor)
else:
prepared_receptor = receptor

#Create a temporary folder for all further calculations
w_dir = prepared_receptor.parent / prepared_receptor.stem
print('The working directory has been set to:', w_dir)
(w_dir).mkdir(exist_ok=True)

if pocket == 'Reference':
pocket_definition = get_pocket(ref_file, prepared_receptor, 10)
print(pocket_definition)
if pocket == 'RoG':
pocket_definition = get_pocket_RoG(ref_file, prepared_receptor)
print(pocket_definition)
elif pocket == 'Dogsitescorer':
pocket_definition = binding_site_coordinates_dogsitescorer(prepared_receptor, w_dir, method='Volume')

This code gave the following error.

{
"name": "JSONDecodeError",
"message": "Expecting value: line 1 column 1 (char 0)",
"stack": "---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File ~/opt/anaconda3/envs/dockm8/lib/python3.10/site-packages/requests/models.py:974, in Response.json(self, **kwargs)
973 try:
--> 974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/init.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of s (a str instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError Traceback (most recent call last)
Cell In[4], line 2
1 if prepare_protein == True:
----> 2 prepared_receptor = prepare_protein_protoss(receptor)
3 else:
4 prepared_receptor = receptor

File ~/Desktop/tik/DockM8/scripts/protein_preparation.py:81, in prepare_protein_protoss(receptor)
79 query = {'protein_file': upload_file}
80 # Submit the job to ProtoSS and get the job submission response
---> 81 job_submission = requests.post(PROTOSS, files=query).json()
83 # Poll the job status until it is completed
84 protoss_job = poll_job(job_submission['job_id'], PROTOSS_JOBS)

File ~/opt/anaconda3/envs/dockm8/lib/python3.10/site-packages/requests/models.py:978, in Response.json(self, **kwargs)
974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 978 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)"
}

Improve software installation using Python scripts

Improve software installation so that when a tool is selected, it is automatically downloaded if not present in the software directory

Create multipage streamlit app

Add pages for

Also add a Guided Moded vs Expert Mode

GypsumDL adds an extra molecule to the final_library.sdf file to store its parameters, remove that

Add new binding site detection options

IF-SitePred : github
P2Rank : github

Implement PyPackIT as a package manager

https://github.com/RepoDynamics/PyPackIT

Allow for active compound retrieval from CHEMBL and other datasets for decoy generation

Collect the known actives (positive controls) for retrospective analysis. For a given target, these can be found in the scientific literature, patent literature or public databases such as IUPHAR/BPS114, ChEMBL115 or ZINC9,99,116, or available in-house.

While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.While it may be possible to find dozens of actives, it is likely that many come from the same chemical series. For a rigorous control analysis, redundant (i.e., highly similar) compounds should be clustered and the most potent compound selected.

drugbud-suite / dockm8 Goto Github PK

dockm8's People

Contributors

Stargazers

Watchers

Forkers

dockm8's Issues

Recommend Projects

Recommend Topics

Recommend Org