deepaffinity's Issues

DeepAffinity ID and Public ID Mapping Table

Thanks for your great packages.

I want to get mapping table (DeepAffinity ID to public ID) for Protein and Drug each.
(For interpreting DeepAffinity output.)

e.g. For Protein (Map to Uniprot AC)
KV37 | P1234
e.g. For Drug (Map to Pubchem CID)
KV37 | 12345

And could you re-describe how you made pubchem binary fingerprint per compound.
-> Input=Pubchem CID or SMILES,
-> Output=0/1 881 length vector

Thanks in advance

Is the tool Mac compatible?


Is DeepAffinity Mac compatible? Asking because I'm following the installation using the conda environment.yml and these packages are not found:

  • libstdcxx-ng
  • libgcc-ng

Many thanks

Trained Models


Is it possible to share the trained models?


why tflearn?

tflearn is no longer updated, it only supports tf1 and now it seems to be abandoned.
Why did you use tflearn when writing code?

Is it because you like tflearn?

Can't create environment

Hi, thank for your awesome works in DTA prediction, I really appreciate to your team!

I want to make the SPS sequence from my own protein data. I'm struggling to use the SSpro/ACCpro tool.
I refered this manual and I tried to create the conda environment for running the SSpro/ACCpro but I failed to create the environment.

I'm not major in the CS field. I'm sorry if my question is too basic 馃槶.

> Conda info
active environment : base
active env location : /home/miruware/anaconda3
shell level : 1
user config file : /home/miruware/.condarc
populated config files :
conda version : 4.10.3
conda-build version : 3.20.5
python version :
virtual packages : __cuda=11.4=0
base environment : /home/miruware/anaconda3 (writable)
conda av data dir : /home/miruware/anaconda3/etc/conda
conda av metadata url : None
channel URLs :
package cache : /home/miruware/anaconda3/pkgs
envs directories : /home/miruware/anaconda3/envs
platform : linux-64
user-agent : conda/4.10.3 requests/2.24.0 CPython/3.8.5 Linux/5.15.0-60-generic ubuntu/20.04.2 glibc/2.31
UID:GID : 1000:1000
netrc file : None
offline mode : False

Both Training and Testing with my own dataset.

Hi, I appreciate your nice work provides us with insights of SPS.

I read the README pdf file for running your model on my own datasets, but I couldn't find how to run the model from scratch with my own datasets.

I only found that the way "train with your dataset and testing my own dataset", or
"Using pretrained-model with your dataset, testing my own dataset".

What I want to do is "Training with my own dataset and testing with my own dataset, too" for fair comparing.

Thanks in advance!

Checkpoint for ki dataset


Thanks for sharing this great work.

Is it true that the checkpoints you provided are for IC-50 dataset only? If yes, do you have checkpoints for ki dataset as well?

Thank you!

About 'uniprot.human.scratch_outputs.w_sps.tab_corrected' file

Hello, I am very interested in your work.

I find the 'uniprot.human.scratch_outputs.w_sps.tab_corrected' file
in the path where '/data/dataset/uniprot.human.scratch_outputs.w_sps.tab_corrected/'.
It seems to have SPS mapping.
but It has different SPS with 'protein_sequence_SPS_mapping' file.

For example, in case of 'CP3A4', which sequence is

In 'protein_sequence_SPS_mapping' file, The sps of this protein is

However, in 'uniprot.human.scratch_outputs.w_sps.tab_corrected' file, The sps of this protein is

I would like to know why they are different though they have same protein sequence.
Also Is it ok to use sps in 'uniprot.human.scratch_outputs.w_sps.tab_corrected' file for training and testing the DeepAffinity model.

Thank you,

Typrerror when running

I'm trying to use deep affinity with pretained error, but the error appeared :

Tokenizing data in ./data/test_sps
Tokenizing data in ./data/test_smile
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/framework/", line 491, in apply_op
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/framework/", line 704, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/framework/", line 577, in _TensorTensorConversionFunction
% (,, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("GRU/GRU/GRUCell/Gates/add:0", shape=(64, 512), dtype=float32)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 230, in
prot_gru_1 = tflearn.gru(prot_embd, GRU_size_prot,initial_state= prot_init_state_1,trainable=True,return_seq=True,restore=True)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tflearn/layers/", line 294, in gru
scope=scope, name=name)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tflearn/layers/", line 67, in rnn_template
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/", line 197, in static_rnn
(output, state) = call_cell()
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/", line 184, in
call_cell = lambda: cell(input
, state)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tflearn/layers/", line 601, in call
self.trainable, self.restore, self.reuse))
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/ops/", line 1198, in split
split_dim=axis, num_split=num_or_size_splits, value=value, name=name)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/ops/", line 3306, in _split
num_split=num_split, name=name)
File "/home/recher/anaconda3/envs/deep_affinity/lib/python3.6/site-packages/tensorflow/python/framework/", line 514, in apply_op
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

Pubchem fingerprints

Is it possible to share the code that was used to generate the pubchem fingerprints?

Generating the protein SPS representation

Hi, I am trying to run DeepAffinity on the PDBind dataset, and am running into issues when trying to use the SSpro/ACCpro to get the the secondary structure and exposedness predictions for the proteins in PDBind. I've followed the installation instructions and was able to run the software on some proteins, but for the majorities of the proteins it failed. The error message was pretty cryptic and I don't think it's a problem with the binaries since I was able to run the software on some of the PDBind protein sequences. I am wondering if you have some insights on these? Or do you perhaps have the processed data for PDBind? Thanks!

model number in output

What does model numbers mean for the output of the DeepAffinity?
(e.g. Model_7414, Model_20432, Model_399544, Model_1452500)

Thanks in advance!!

Why should the number of SAP representation protein sequence file lines and the number of Canonical compound SMILE file lines match?


I am trying to get results of my own data with your model.

(1) According to the file "", it seems that the number of lines for input protein sequences file and compound file must matches like below.
鞀ろ伂毽办兎, 2022-09-22 10-46-42
Is it mean that the number of each entity in both files have to be matched or literally the the number of lines of both files have to be matched?

(2) I got two files for my own data after following your manual.
Could you tell me if their entities' structure are correct for model input?

  • CID_Smi_Feature:
    鞀ろ伂毽办兎, 2022-09-22 11-31-44
  • protein_grouped_finalPresentation
    鞀ろ伂毽办兎, 2022-09-22 11-33-55

Thank you,



I was looking for channel, GPCR, ER and kinase datasets with train/test splits, but couldn't find them in In "baseline" folder. Could you please help me?

