SEMA
SEMA (Spatial Epitope Modelling with Artificial intelligence) is a tool for conformational B-cell eptiope prediction from the primary protein sequence or tertiary structure. SEMA involves the use of sequence-based (SEMA-1D) and structure-based (SEMA-3D) approaches. SEMA-1D model is based on an ensemble of Esm-1v transformer deep neural network protein language models. SEMA-3D model is based on an ensemble of inverse folding models, Esm-IF1. Both models were fine-tuned to predict the antigen interaction propensity of the amino acid (AA) residue with Fab regions of immunoglobulins. SEMA provides an interpretable score indicating the log-scaled expected number of contacts with antibody residues.
SEMA is also availble via web-interface.
Disclaimer:
This code is provided under MIT License
Data
The entire data set with contact number per residue can be downloaded from the link.
Dataset contains following columns:
- pdb_id — identificator in the PDB database
- resi_pos — the residue position in the PDB structure
- resi_name — amino acid 3-letter name
- res_aa — amino acid symbol
- anigen_chain — name of the anigen chain in the PDB structure
- fab_chains — names of the antibody chains in the PDB structure
- contact_number_R1=i_R2=j — contact number values calcualted as the number of antibody residues in contact with any atom of antigen residues within the distance radius R1. Residues between R1 and R2 have a zero contact number. Residues, which located outside R2 distance radius, have a '-100' value.
You can generate your own dataset with different R1 and R2 using scripts in the dataset_generation directory.
The directory data contains example of training and test sets and example of pdb-file for SEMA-3D inference.
Environment creation
Python 3.8 is required.
virtualenv sema_env
source sema_env/bin/activate
pip install -r requirements.txt
Training the models
For trainig the model you can use Jupyter Notebooks SEMA-1D_finetuning or SEMA-3D_finetuning.
In case you will be training SEMA-3D model, you will need to download additional data:
pickle-file with a dataset of teritic structures or an original set of processed pdb-files.
Inference
Prepare your model weights or download ours:
- Create direcrory models:
mkdir models
. - Go into the directory:
cd models
- Download weights for SEMA-1D:
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth
or SEMA-3D:
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_0.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_1.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_2.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_3.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_4.pt
Next, run inference using Jupyter Notebooks SEMA-1D_inference or SEMA-3D_inference.