cansyl / deepscreen Goto Github PK

DEEPScreen: Virtual Screening with Deep Convolutional Neural Networks Using Compound Images

Python 100.00%

drug-discovery drug-repurposing machine-learning deep-learning convolutional-neural-networks 2d-images-of-compounds drug-target-interactions prediction-model bioinformatics cheminformatics

deepscreen's People

Contributors

Stargazers

Watchers

Forkers

hevalatas phenylazide aspirincode xinyangubc bbyun28 akulashray laulopezreal zbjzbj001 exjustice sailfish009 nazim-med austinapple sametle06 beira-bf rjd55 wangtongxing kansil douxiaotian mubashermohammed rohitjayale tonyzzhang eliplam kimjisaner littlepeaches catalinaioan3 vo-thanh-phuong trypanosomatics rnaimehaom omidtarkhaneh tejasgautam bioinformatis mvolkanatalay zy-jlu wangdi2016 sebastianjinich trypanosomatics rxdcpu1 gihub12356 kemalorer thecubedaryl

deepscreen's Issues

New compound prediction - not in CHEMBL and doesn't have a CHEMBL_ID

How to predict DTI of new compounds with trained protein models？

Thanks！

Error for running

Hello, I'm interesting your work so I try to use a given training model.
However, I got error message by last epoch.

Epoch :99
Training mode: True
Epoch 99 training loss: 3.122581034898758
There was a problem during training performance calculation!
Validation mode: True
There was a problem during validation performance calculation!
There was a problem during test performance calculation!
Traceback (most recent call last):
File "./bin/main_training.py", line 69, in
args.dropout, args.epoch, args.en)
File "/home/njgoo/Data1/program/DEEPScreen/bin/train_deepscreen.py", line 184, in train_validation_test_training
best_val_test_result_fl.write("Test {}:\t{}\n".format(scr, best_test_performance_dict[scr]))
UnboundLocalError: local variable 'best_test_performance_dict' referenced before assignment

Thanks for your reply!

How to define “bioactivity values” ?

“we constructed positive (active) and negative (inactive) training datasets as follows: for each target, compounds with bioactivity values ≤10 μm were selected as positive training samples...”
Could you please explain how to define "bioactivity values"？
Looing forward to your reply！

DEEPScreen Supporting Data for Output/results

DEEPScreen gives out results active or inactive.
Is there a data of binding affinity included in it. Also the accuracy of result will be lesser if 2D Image is taken rather than 3D conformation image or SMILES?
Is there a way that we run virtual docking prediction as well which gives out data of Binding Affinity Energy, Binding Site and Size of Predicted Binding Site.

Receiving "There was a problem during..." Error

Hi, I was trying out the steps listed in the README.md. But I realised the main_training.py is in the BIN folder. And if I enter the folder and run it, I get both the "There was a problem during..." error messages. Must I move the scripts out of the BIN folder for the command to work?

Dataset in the Code

Hello there,

Thanks for sharing such a nice idea and the code. It is motivating!

Well, I am just beginning to reconstruct your code and have encountered an issue. Please correct me if I am wrong. According to README the file named 'chembl27_preprocessed_filtered_act_inact_comps_10.0_20.0_blast_comp_0.2.txt ' should be the training data set that you obtained through filtering ChEMBL v23 data(about 15M dataset), right?

So, I expected the number of data included in the file be 769,935, matching the one in the paper, but I found 2,292,989 target-ligand pairs in the file, which is nearly three times larger. Is it that you updated the file augmenting the data? or that I have to do some data processing in order to get 769,935 pairs? I am a little confused.

I'd appreciate if you could help me with this.

Thanks

Possible overfitting on the test set

Hello,

I was going over the code and noticed something strange in train_deepscreen.py. More specifically, I believe there is a problem in line 172

The code basically checks for every training epoch the performance on the validation and test sets and keeps the epoch with the highest Matthews correlation coefficient. The final performance printed by the model is the best possible test set performance, which suggests that the model overfits the test set.

I am wondering about the rationale behind the choice, so I would appreciate it if you could share more info.

Best,
Dimitrios

ValueError: cannot reshape array of size 13797420 into shape (200,200,1)

I am attempting to reproduce the results in your paper and then train models on my own dataset, but several models failed to train, saying "ValueError: cannot reshape array"

Any idea on how to fix this??

Traceback (most recent call last): 
  File "trainDEEPScreenDUDE.py", line 226, in <module>
    trainModelTarget(model_name, trgt, optim, learning_rate, n_epoch, n_of_h1, n_of_h2, dropout_keep_rate, rotate,save_model)
  File "trainDEEPScreenDUDE.py", line 51, in trainModelTarget
X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
ValueError: cannot reshape array of size 13797420 into shape (200,200,1)

How do I use a pre-trained model to generate predictions?

I'm new to machine learning and am trying to use DEEPScreen to generate predictions for some new molecules. I want to use a pre-trained model and don't want to train it each time. How would you recommend I do it? I'm also unsure about how to read the input images. They're in a directory.

thank you!

How to screen for drugs with a protein not found in the file?

Greetings sir,

I want to use your model to screen for drugs for a protein not found in the file that contains the protein names.
Could you please help me?

and if I want to screen certain drugs from databases how can I do this?

thanks

UnboundLocalError: local variable 'best_test_performance_dict' referenced before assignment

Hi!

python main_training.py --targetid CHEMBL286 --model CNNModel1 --fc1 256 --fc2 128 --lr 0.01 --bs 64 --dropout 0.25 --epoch 100 --en my_chembl286_training

For the tutorial (e.g train step), I can't run the code normally.
Could you check this, please?

test_threshold problem and zip file problem

hello dear tuncadogan,

deepscreen_models_hyperparameters_performance_results.tsv does not have a column called 'test threshold' which will be needed in the program when predicting DTIs, could you please tell me what is the exact meaning of it, how can I give a valid value for it.

some zip files are damaged, I can not open it, how can I use it (these files are useful when training deepscreen system. )

thank you very much

training model _ another species

Hi,
if i have interactions (Drugs-targets) with other organisms (not human), is it possible to run the training model?
or it is specific for human?