SEMA

SEMA (Spatial Epitope Modelling with Artificial intelligence) is a tool for conformational B-cell eptiope prediction from the primary protein sequence or tertiary structure. SEMA involves the use of sequence-based (SEMA-1D) and structure-based (SEMA-3D) approaches. SEMA-1D model is based on an ensemble of Esm-1v transformer deep neural network protein language models. SEMA-3D model is based on an ensemble of inverse folding models, Esm-IF1. Both models were fine-tuned to predict the antigen interaction propensity of the amino acid (AA) residue with Fab regions of immunoglobulins. SEMA provides an interpretable score indicating the log-scaled expected number of contacts with antibody residues.

SEMA is also availble via web-interface.

Disclaimer:

This code is provided under MIT License

Data

The entire data set with contact number per residue can be downloaded from the link.
Dataset contains following columns:

pdb_id — identificator in the PDB database
resi_pos — the residue position in the PDB structure
resi_name — amino acid 3-letter name
res_aa — amino acid symbol
anigen_chain — name of the anigen chain in the PDB structure
fab_chains — names of the antibody chains in the PDB structure
contact_number_R1=i_R2=j — contact number values calcualted as the number of antibody residues in contact with any atom of antigen residues within the distance radius R1. Residues between R1 and R2 have a zero contact number. Residues, which located outside R2 distance radius, have a '-100' value.

You can generate your own dataset with different R1 and R2 using scripts in the dataset_generation directory.

The directory data contains example of training and test sets and example of pdb-file for SEMA-3D inference.

Environment creation

Python 3.8 is required.

virtualenv sema_env
source sema_env/bin/activate
pip install -r requirements.txt

Training the models

For trainig the model you can use Jupyter Notebooks SEMA-1D_finetuning or SEMA-3D_finetuning.

In case you will be training SEMA-3D model, you will need to download additional data:
pickle-file with a dataset of teritic structures or an original set of processed pdb-files.

Inference

Prepare your model weights or download ours:

Create direcrory models: mkdir models.
Go into the directory: cd models
Download weights for SEMA-1D:

wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_0.pth 
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_1.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_2.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_3.pth
wget -O sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_1d_ft_cn_atom_r1_8.0_r2_16.0_4.pth

or SEMA-3D:

wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_0.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_0.pt
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_1.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_1.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_2.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_2.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_3.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_3.pt 
wget -O sema_3d_cn_atom_r1_8.0_r2_18.0_4.pth https://bioinformatics-kardymon.obs.ru-moscow-1.hc.sbercloud.ru/SEMA_weights/sema_3d_cn_atom_r1_8.0_r2_18.0_4.pt

Next, run inference using Jupyter Notebooks SEMA-1D_inference or SEMA-3D_inference.

techthiyanes / semai Goto Github PK

semai's Introduction

SEMA

Disclaimer:

Data

Environment creation

Training the models

Inference

semai's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent